10.1.1.186.4635

Upload: palash-swarnakar

Post on 04-Apr-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/29/2019 10.1.1.186.4635

    1/91

    Equalizing Filter Design for Cross-talk Cancellation

    by

    Jihong Ren

    B. Sc. (Electrical Engineering), Huazhong University of Science and Technology, 1995

    M. Eng. (Electrical Engineering), Huazhong University of Science and Technology, 1998

    M. Sc. (Neuroscience), The University of British Columbia, 2000

    A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF

    THE REQUIREMENTS FOR THE DEGREE OF

    Master of Science

    in

    THE FACULTY OF GRADUATE STUDIES

    (Department of Computer Science)

    we accept this thesis as conformingto the required standard

    The University of British Columbia

    June 2002

    c Jihong Ren, 2002

  • 7/29/2019 10.1.1.186.4635

    2/91

    Abstract

    As interconnect line width and spacing decreases and operating clock rate increases, in-

    terconnect has become a bottleneck in developing high-speed integrated circuits, multichip

    modules, printed circuit boards, and systems. With small line spacing, mutual capacitance

    and inductance approach the level of self-capacitance and inductance, and can severely de-

    grade signal integrity. The well-known equalizing filter method can significantly improve

    signal integrity. This thesis explores the effectiveness of equalizing filters in cross-talk can-

    cellation for high-speed, off-chip buses. It demonstrates that linear programming provides

    effective methods for designing cross-talk canceling equalizing filters that greatly increase

    the bandwidth of high-speed digital buses.

    ii

  • 7/29/2019 10.1.1.186.4635

    3/91

    Contents

    Abstract ii

    Contents iii

    List of Tables vi

    List of Figures vii

    Acknowledgments ix

    1 Introduction 1

    1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

    1.2 Method and Proposed System Structure . . . . . . . . . . . . . . . . . . . 2

    1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

    1.4 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

    2 Background 6

    2.1 Transmission channel limitations . . . . . . . . . . . . . . . . . . . . . . . 7

    2.2 Equalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

    2.2.1 Design Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

    iii

  • 7/29/2019 10.1.1.186.4635

    4/91

    2.2.2 Application of equalizing filters in cross-talk cancellation for the

    local telephone subscriber loop . . . . . . . . . . . . . . . . . . . . 12

    2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

    3 Coupled Distributed RLC Interconnect Model 14

    3.1 Interconnect Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

    3.2 Bus parameters and Simulation results . . . . . . . . . . . . . . . . . . . . 18

    4 Linear Equalizing Filter Design 20

    4.1 Measurements of filter performance . . . . . . . . . . . . . . . . . . . . . 20

    4.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

    4.2.1 Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

    4.2.2 Matrix Representations of Convolution . . . . . . . . . . . . . . . 25

    4.3 Least Squares Optimization Method with Pseudo-random Input . . . . . . . 30

    4.3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

    4.3.2 Least Square problem formulation . . . . . . . . . . . . . . . . . . 30

    4.3.3 An example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

    4.4 Linear Programming Method with Worst-case Input . . . . . . . . . . . . . 38

    4.4.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

    4.4.2 Linear Programming Problem formulation . . . . . . . . . . . . . . 44

    4.4.3 Smoothing filter . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

    4.4.4 An example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

    4.5 Testing results: Comparison of LSQ method and LP method . . . . . . . . 49

    4.5.1 Worst-case input sequence . . . . . . . . . . . . . . . . . . . . . . 49

    4.5.2 Indirect coupling . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

    4.5.3 Over-fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

    iv

  • 7/29/2019 10.1.1.186.4635

    5/91

    4.5.4 Minimum bit time . . . . . . . . . . . . . . . . . . . . . . . . . . 58

    4.6 Time-variant Linear FIR Filter . . . . . . . . . . . . . . . . . . . . . . . . 62

    4.7 Optimized Smoothing Filter . . . . . . . . . . . . . . . . . . . . . . . . . 64

    4.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

    5 Predictor-Corrector Algorithm with Model Reduction 67

    5.1 Mehrotras predictor-corrector algorithm . . . . . . . . . . . . . . . . . . . 68

    5.2 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

    5.2.1 Starting and Stopping . . . . . . . . . . . . . . . . . . . . . . . . . 71

    5.2.2 Solving the linear systems . . . . . . . . . . . . . . . . . . . . . . 72

    5.3 Ill-conditioning and Model Reduction . . . . . . . . . . . . . . . . . . . . 72

    5.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

    6 Conclusions and Future Work 77

    6.1 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

    Bibliography 81

    v

  • 7/29/2019 10.1.1.186.4635

    6/91

    List of Tables

    4.1 Performance of equalizing filters with different sizes for a bus 32-bits wide

    and 5 cm long. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

    4.2 Performance of equalizing filters with different sizes for buses 32-bits wide.

    All filters designed using the LP method. . . . . . . . . . . . . . . . . . . . 61

    4.3 Performance of different smoothing filters with equalizing filters de-

    signed by the LP method at 300 ps. . . . . . . . . . . . . . . . . . . . . . . 65

    5.1 linprog() iteration display . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

    5.2 Iteration display of our approach: Mehrotra interior-point method with

    model reduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

    vi

  • 7/29/2019 10.1.1.186.4635

    7/91

    List of Figures

    1.1 Proposed transmission network structure. . . . . . . . . . . . . . . . . . . 3

    2.1 A coupled microstrip transmission line. . . . . . . . . . . . . . . . . . . . 7

    2.2 Simple lumped model for two coupled interconnects . . . . . . . . . . . . 7

    2.3 Block diagram of an equalized transmission channel (from [3]). . . . . . . 9

    2.4 Simplified model for full-duplex transmission over a linear multi-input/multi-

    output channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

    3.1 Analytical solution from equation 3.16 vs. Spice simulation results . . . . . 19

    4.1 An illustrative eye diagram . . . . . . . . . . . . . . . . . . . . . . . . . . 21

    4.2 Example of a data eye . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

    4.3 Predistorted signal: equalizing filter output . . . . . . . . . . . . . . . . . . 39

    4.4 Examples of output signal for 32-bit interconnect network . . . . . . . . . 40

    4.5 Eye-diagrams for a 32-bit interconnect network . . . . . . . . . . . . . . . 41

    4.6 Frobenius norm of the bus impulse response. . . . . . . . . . . . . . . . . 46

    4.7 System with smoothing filter at the receiver end. . . . . . . . . . . . . . . . 48

    4.8 Example of output signals for systems with and without the equalizing filter

    designed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

    vii

  • 7/29/2019 10.1.1.186.4635

    8/91

    4.9 Pseudo-random test: eye diagrams for systems with and without the equal-

    izing filter designed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

    4.10 Worst-case test vs. Pseudo-random test . . . . . . . . . . . . . . . . . . . . 53

    4.11 Worst-case performance of different equalizing filters designed with the LP

    method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

    4.12 Indirect coupling between non-adjacent lines . . . . . . . . . . . . . . . . 55

    4.13 Eye diagram for system with equalizing filters designed by the LP

    method. Grey traces indicate high signal transmitted. Black traces indicate

    low signal transmitted. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

    4.14 Magnitude of overshoot increases with the size of the equalizing filter de-

    signed with the LP method . . . . . . . . . . . . . . . . . . . . . . . . . . 57

    4.15 The convolution procedure of the time-variant FIR filter . . . . . . . . . . . 63

    viii

  • 7/29/2019 10.1.1.186.4635

    9/91

    Acknowledgments

    First of all, I would like to thank my supervisor Dr. Mark Greenstreet. This thesis would not

    have been possible without his inspiration, extensive support , patience and encouragement.

    I also would like to thank my husband, Rui Li, for his consistent support.

    JIHONG REN

    The University of British Columbia

    June 2002

    ix

  • 7/29/2019 10.1.1.186.4635

    10/91

    Chapter 1

    Introduction

    1.1 Motivation

    Advances in digital integrated circuit (IC) fabrication technology have resulted in an ex-

    ponential growth for the speed and integration levels of ICs. With more and more circuits

    placed on each die, high-performance systems require larger and larger I/O bandwidth. This

    demand has been addressed by increasing the number of high-speed signals and the per-pin

    interconnection bandwidth. Although the number of I/Os has increased from

    pins in the 1970s, to several hundred pins per IC now [18], this growth is being rapidly

    out-paced by the bandwidth demands. To continue to improve overall system performance,

    the per-pin interconnection bandwidth must scale with the speed and integration level of

    ICs. However, without new approaches, we will soon reach the limit set by the intrinsic

    properties of copper lines.

    The number of I/Os increases by 12% per year, half of which is due to the increase

    in chip perimeter and half of which is due to the increase in pin density. On chip, both the

    number of devices and clock rates have increased at 50-60% per year, creating a growing

    1

  • 7/29/2019 10.1.1.186.4635

    11/91

    bandwidth gap. Higher bit-rates and pin densities have come to a point that interconnections

    are no longer well-behaved short interconnections. With the decreasing cross sectional ar-

    eas of interconnections, the line resistance per unit length has increased to a point that long

    interconnections can no longer be considered lossless. Resistive effects are particularly se-

    vere at high bit-rates because of both the high frequency roll-off of RC transmission lines

    and the increase of resistance with frequency due to the skin effect. To achieve maximum

    packing density, designers attempt to place signal lines as close to each other as possible.

    This introduces problems of electromagnetic coupling (cross-talk) which are exacerbated

    by high data rates. Cross-talk has become a critical issue in interconnect performance and

    hence overall system performance. Traditionally, cross-talk is reduced by carefully control-

    ling line geometry and arranging circuits to decrease the coupled line length. Moreover,

    signaling conventions that are less susceptible to coupled energy can be used. These meth-

    ods reduce cross-talk in a somewhat ad-hoc way. For example, as a rule-of-thumb, a ratio of

    two-to-one for line spacing against line width is commonly used, based on the assumption

    that cross-talk decreases monotonically with the increase in line spacing. However, this

    simple assumption can fail for high bit-rate design. The relationship between line spacing

    and line width is non-linear, and a two-to-one ratio between width and spacing may actu-

    ally result in higher coupled energy than smaller line spacing [11][20]. Furthermore, while

    these methods might reduce the amount of cross-talk, the problem of cross-talk still exists.

    New approaches in cross-talk reduction are needed.

    1.2 Method and Proposed System Structure

    Equalizing filters have been used effectively for cross-talk cancellation in acoustic applica-

    tions such as telephone line subscriber system [1][6][7]. Recently, they have been used to

    compensate for the frequency-dependent attenuation of transmission lines [2].

    2

  • 7/29/2019 10.1.1.186.4635

    12/91

    0

    0

    Transmitter

    filter

    filter

    filter

    filter

    Bus

    Filter Network

    Receiver

    Figure 1.1: Proposed transmission network structure.

    This thesis explores the effectiveness of equalizing filters in cross-talk cancellation

    for high-bandwidth, digital communication. The proposed system structure is depicted in

    figure 1.1. In this transmission system, an equalizing filter is assigned to each wire of the

    bus. Each filter takes the input signals on a wire and its adjacent wires as its inputs, and

    outputs a predistorted signal onto the wire. For a -bit bus, the filter system can be viewed

    as a network. Cross-talk is eliminated if the filter network is designed in a way that

    the concatenation of the filter network and the bus has frequency response in the form of a

    diagonal matrix.

    Several optimal filter design strategies are explored, such as the linear programming

    method and the least-squares method. Matlab simulation results show that the resulting

    3

  • 7/29/2019 10.1.1.186.4635

    13/91

    filters dramatically reduce cross-talk and substantially increase the maximum bandwidth

    that can be achieved by buses on PC boards. Thus, the equalizing filter method is promising

    for cross-talk cancellation and merits further investigation.

    1.3 Contributions

    This thesis demonstrates that linear programming models provide effective methods for

    designing cross-talk canceling equalizing filters that greatly increase the bandwidth of high

    speed digital buses on printed circuit boards. The following are the major contributions

    supporting this thesis:

    Equalizing filter design for high speed digital buses can be formulated as a least

    squares optimization problem, using a metric for optimality. This metric ensures

    the quality of the received signal on average.

    The metric corresponds to the traditional eye height measurement of signal in-

    tegrity and guarantees worst-case performance. The filter design problem for

    optimality can be formulated as a linear programming problem.

    An evaluation of the linear programming and least squares methods for a variety of

    filter configurations shows that both offer a dramatic increase in bandwidth when

    compared with a bus with no filter or with transmitter pre-emphasis without cross-

    talk cancellation. Furthermore, the filters designed for the optimality criterion

    using linear programming significantly outperform their counterparts designed by

    traditional, least-squares method, when evaluated for digital data transmission.

    To evaluate these methods, I implemented them both using Matlab. In doing so,

    I found that Matlab optimization package does not always converge for the linear

    4

  • 7/29/2019 10.1.1.186.4635

    14/91

    programming problems presented in this thesis. Therefore, I implemented an interior-

    point method with a model reduction technique that successfully solves the linear

    programming problems encountered.

    1.4 Thesis Outline

    In this thesis, Chapter 2 introduces the equalizing filter technique and its existing applica-

    tions. Chapter 3 describes a coupled distributed RLC model for transmission lines. Based

    on this model, Chapter 4 discusses various techniques, such as least squares and linear

    programming, that I explored to design optimal linear FIR equalizing filters. Chapter 5 is

    devoted to Mehrotras interior point method with a model reduction technique that is used

    to solve our particular linear programming problem introduced in Chapter 4.

    5

  • 7/29/2019 10.1.1.186.4635

    15/91

    Chapter 2

    Background

    Computer system performance is often limited by communication bandwidths between

    chips and between subsystems. A typical signaling system consists of a transmitter, a chan-

    nel, and a receiver. The transmitter encodes digital information as analogue waveforms on

    the transmission channel, such as a circuit board trace. On the other end of the transmission

    channel, the receiver samples and quantizes the signal to recover the original digital infor-

    mation. Although we often think of transmission channels such as wires as being ideal by

    having zero resistance, capacitance and inductance, real wires are not ideal but rather par-

    asitic circuit elements whose geometry affects their electrical properties. Moreover, with

    small line spacing, inductive and capacitive cross-talk can severely degrade signal integrity.

    With the growth in integration levels, the interconnect line width and spacing decreases,

    and interconnect has become a bottleneck in high-speed digital designs.

    This chapter first discusses the channel characteristics, particularly PC board traces.

    I then provide background on the equalizing filter technique and an overview of its related,

    existing applications.

    6

  • 7/29/2019 10.1.1.186.4635

    16/91

    t

    s

    w w

    h

    Figure 2.1: A coupled microstrip transmission line.

    Figure 2.2: Simple lumped model for two coupled interconnects

    2.1 Transmission channel limitations

    Transmission channels, such as PC board traces and coaxial or twisted-pair cables, have

    limited bandwidths that are determined by their physical characteristics: the size and con-

    struction of their conductor and shield, and the dielectric material. In this thesis, I am

    particularly interested in high-speed interconnect on PC boards. Thus, the following dis-

    cussion focuses on PC board traces. Figure 2.1 shows typical microstrip interconnections.

    A simple lumped model for two coupled interconnects is shown in figure 2.2.

    The resistance per unit length of a trace is given by the conductance of the trace ma-

    terial (typically copper) divided by the cross-sectional area of the trace. The cross-sectional

    area is the product of the width of the trace and its thickness. The width is determined by the

    design. The thickness is specified when the board is manufactured: thickness is specified

    in ounces of copper per square yard. A board with 1 oz copper has a conductor thickness

    7

  • 7/29/2019 10.1.1.186.4635

    17/91

    of roughly 35 microns. More accurate models consider the skin effect: at high frequencies,

    currents flow closer to the surface of the trace, resulting in a frequency-dependent increase

    in the series resistance [10][3].

    The capacitance per unit length ( ) and the inductance per unit length ( ) of a mi-

    crostrip trace are determined by many factors including its width and height and its separa-

    tion from the ground plane. Electric and magnetic fields between adjacent traces lead to the

    coupling capacitance, , and the mutual inductance, , respectively.

    For PC board traces, the loss in transmission is primarily due to the series resistive

    component of the copper ( ). Because of this loss, without a special transmission scheme,

    off-chip signaling on long wires, even with good current-mode signaling methods, is limited

    to about 1GHz [2]. Full-swing unterminated signaling methods that are used in most digital

    systems have even lower limits. With narrow wires and smaller line spacing, the coupling

    inductance and capacitance between adjacent lines approach the level of self-inductance and

    capacitance. In high speed circuits, because of fast signal rise times, coupling effects are

    severe and have become a primary concern for present and future high-speed high-density

    circuit design. Besides the resistive properties of the line, the coupling effects further limit

    the maximum bit-rate at which data can be transmitted correctly.

    2.2 Equalization

    An ideal transmission channel would in all cases deliver the near end signal in from

    the driver without distortion to the far end receiver, i.e. out in , where is

    the propagation delay across the channel. Thus, an ideal channel would have the transfer

    function , where and is the identity matrix. If an equalizing filter has a

    transfer function that equals the inverse of the transfer function of the channel, the concate-

    nation of the equalizer and the channel has a flat frequency and phase response. This is the

    8

  • 7/29/2019 10.1.1.186.4635

    18/91

    Transmitter

    Equalizer

    G(s)

    Channel

    H(s)

    Figure 2.3: Block diagram of an equalized transmission channel (from [3]).

    equalization technique widely used to actively compensate for the channel transfer func-

    tion. Channel equalization can be performed at the transmitter end, as shown in figure 2.3,

    preceding the actual channel driver. Transmitters that utilize equalizing filters are called

    pre-distorting transmitters. The equalizing filter can also be incorporated into the receiver,

    called receiver equalization. It can also be split between the two ends.

    Pre-distorting Transmitters

    Pre-distorting transmitters integrate equalizing filters, commonly realized as finite

    impulse response (FIR) digital filters. While infinite impulse response (IIR) [9] fil-

    ters can be more flexible than FIR, they are generally not used for high data rate

    transmission because of the difficulty of calculating the IIR recurrence (i.e. feed-

    back) at very high rates. The inputs to the equalizing FIR filters are the present and

    past transmitted symbols. The output of the FIR filter is a weighted sum of these

    symbols. The length of the filter depends on the number of symbols that affect the

    response of the channel to the current symbol. The filter coefficients depend on the

    channel characteristics.

    Pre-distorting transmitters were first used by Poulton et al. [2] in a serial channel over

    copper wires at 4Gb/s to reduce intersymbol interference caused by frequency depen-

    9

  • 7/29/2019 10.1.1.186.4635

    19/91

    dent attenuation of the channel. Later, other groups [4][17] used the same technique

    to design high-speed serial link transceivers. FIR equalizing filters built into trans-

    mitters are easy to implement at very high speed because of the availability of trans-

    mitted symbols at the transmitter end. Furthermore, because the transmitted symbols

    are either 1s or 0s, multiplication with the filter coefficients is easy. For example, in

    [2], a five-tap FIR filter is implemented with digital adders, and a digital-to-analog

    converter (DAC) is used to generate pre-distorted pulses. However, because trans-

    mitters generally dont have information of received signals, FIR filter coefficients

    are obtained either by characterization of channel properties in advance [2][4], or by

    adaptive implementation with feedback information from the receiver end [17].

    Receiver Equalization

    Receiver equalization can be realized either with analog filters preceding the analog-

    to-digital converter (ADC) or with digital filters following the ADC. The latter one is

    the usual technique because digital filters are easy to implement and adapt. Moreover,

    more complex and non-linear filters can be implemented. However, it is well-known

    that receiver equalization amplifies high frequency noise [8]. Furthermore, histori-

    cally, high speed ADC technology is behind high speed DAC technology. Therefore,

    pre-distorting transmitters are commonly used in high speed transmission systems

    that run at GHz speed. Recently, Horowitzs group realized 8-Gsamples/s ADC in

    0.25 m CMOS, which makes high speed links with equalization at the receiver end

    possible [19].

    2.2.1 Design Methods

    The following are two methods that are currently used to design equalizing filters.

    10

  • 7/29/2019 10.1.1.186.4635

    20/91

    Zero-forcing method

    The transfer function of the channel can be derived from models established for

    each particular channel (reviewed in [18]). The frequency response of the channel

    and also the desired frequency response of the equalizing filter is then calculated at

    each frequency point. This set of discrete points is used to obtain a discrete impulse

    response function using inverse Fourier transform. The following two steps are used

    to obtain a more manageable impulse response function.

    Windowing: where is the desired impulse response

    and is the windowing function. This step is needed to obtain a filter with

    a finite number of taps.

    Delaying: is shifted to the right until the samples are all indexed by a

    non-negative integer to obtain a causal filter.

    In practice, large windows must be used to obtain effective equalizing filters. Ac-

    cordingly, many researchers have turned to using optimization methods to obtain

    good approximate equalizing filters. This is the approach that I take in this thesis.

    Least Squares Minimization

    With an ideal transmission channel, the received signal is a delayed version of the

    transmitted signal. Using least squares minimization, the equalizing filter design

    problem is equivalent to the problem of designing equalizing filters to determine the

    values for the filter coefficients that minimize the norm of the difference between

    the received signal and the delayed version of the transmitted signal.

    This method is used in optimal pre-emphasis equalizing filter design in [2][19] to

    build serial links that operate at over 1 Gigabits per second. Also it is widely used to

    design equalizers for telephone subscriber systems [1][6][7].

    11

  • 7/29/2019 10.1.1.186.4635

    21/91

    receiver

    b(t)

    a(t)

    P(t)

    transmitter

    filter

    G(t)

    channel

    filter

    transmitter

    P(t)

    channel

    farend

    H(t)

    nearend

    filter

    R(t)

    n(t)

    Figure 2.4: Simplified model for full-duplex transmission over a linear multi-input/multi-

    output channel. are the impulse responses of the far-end channel,

    near-end channel, transmitter filter and receive filter respectively.

    2.2.2 Application of equalizing filters in cross-talk cancellation for the local

    telephone subscriber loop

    Equalizing filters are used to reduce intersymbol interference caused by the characteristics

    of a single channel [2][4][17][19]. Until now, no work has been reported on the application

    of equalizing filters in cross-talk cancellation for high speed buses that run at multi-Gb/s.

    Along with the limited bandwidth of transmission channels, cross-talk is another critical

    problem that limits the maximum data rate that can be achieved by high density wide buses.

    Local telephone subscriber loops have the same problem. Bundles of twisted copper wires

    are used in local telephone subscriber loops. Because of the close physical proximity, cross-

    talk interference from neighbouring channels is one of the major limitations on the max-

    imum data rate that can be achieved over the loops [7]. Multichannel equalization can

    effectively suppress both near- and far-end cross-talk [6][7].

    In these papers, a cable of twisted pairs that is terminated at a single physical loca-

    tion is treated as a single multi-input/multi-output channel. Cross-talk is then characterized

    by off-diagonal components of the matrix impulse response of the channel. The multichan-

    nel adaptive FIR equalizers, the transmitter and the receiver process the entire vector of

    12

  • 7/29/2019 10.1.1.186.4635

    22/91

    inputs and outputs (see figure 2.4). Rather than directly diagonalizing the system trans-

    fer function matrix, the multichannel equalizers are designed to minimize the norm of

    the difference between the received signal and the transmitted waveform. In Salzs work

    [16], the minimum mean square error (MMSE) linear equalizer for the channel is

    completely specified, assuming uncorrelated data and white noise. Later, Honig et al. [6]

    generalized Salzs work by assuming correlated data symbols, pulse amplitude modulation

    (PAM) signals and colored noise.

    2.3 Summary

    The equalization technique has been successfully used to compensate for resistive effects of

    transmission lines [2][4][17]. With this technique and carefully chosen signaling methods,

    multi-Gb/s serial links have been built. Equalization is also commonly used in telephone

    subscriber systems to cancel near-end and far-end cross-talk [7][1][6]. In this thesis, I

    explored the effectiveness of the equalization technique in cross-talk cancellation for high-

    speed, off-chip buses. Moreover, besides the least squares optimization technique that is

    commonly used to design equalizing filters, this thesis is the first work that formulates the

    optimal equalizing filter design problem into a linear programming problem for high speed

    digital buses.

    13

  • 7/29/2019 10.1.1.186.4635

    23/91

    Chapter 3

    Coupled Distributed RLC

    Interconnect Model

    3.1 Interconnect Model

    An electrical model of a uniform transmission line has inductance , resistance , capaci-

    tance and parallel conductance , all per unit length. The term models the effects of

    current leakage and is practically zero for most digital transmission on integrated circuit

    and printed circuit boards.

    We would like our system be able to operate at bit rate greater than 2 Gbits/sec.

    Assuming that the rise and fall times are 10% of the bit time, edges have an electrical length

    of = Rise time (ps)/Delay (ps/cm) = 50 (ps)/33 (ps/cm) = 1.51 cm, where 33 ps/cm is

    the speed of light in a vacuum. The propagation delay of signals traveling in other media

    such as a PCB trace is larger [10], and thus the corresponding electrical length would be

    even smaller. For example, the common FR-4 printed circuit board material has a dielectric

    constant of about 4.5 and propagation delay about 71 ps/cm. The electrical length of a bit at

    14

  • 7/29/2019 10.1.1.186.4635

    24/91

    2Gbits/sec is 0.7 cm. As a rule of thumb, distributed models should be used when the wire

    length is greater than or equal to . Thus the critical dimension separating lumped from

    distributed systems for printed circuit board is 0.117 cm. The wire lengths we consider here

    are in the range of 2 50 cm. Thus a distributed model is needed to correctly model the

    behavior of this system at multigiga bit/sec data rate. Assuming the TEM mode of wave

    propagation, for a lossy multiconductor system of wires, we have inductance matrix

    , capacitance matrix and resistance matrix , where , is the mutual inductance

    and coupling capacitance between line and respectively. For simplicity, the following

    assumptions are made:

    Coupling between lines is entirely due to mutual inductance and mutual capacitance.

    There is no conductance between wires of the bus or between wires of the bus and

    ground. Only coupling between adjacent lines are taken into account. We ignore

    direct coupling between wires of the bus that are not adjacent.

    Every wire is assumed to have the same characteristics.

    Wires are assumed to be arranged around a cylinder so that every wire is the same as

    others.

    With the above assumptions, the and matrices are shown below. The capacitance

    matrix has the same structure as .

    ......

    ......

    ......

    (3.1)

    15

  • 7/29/2019 10.1.1.186.4635

    25/91

    The behavior of this distributed system can be described by the following partial

    differential equation, where voltage vector and current vector are both functions of

    position and time .

    (3.2)

    (3.3)

    Taking the Fourier transformation of these equations yields:

    (3.4)

    (3.5)

    where is the Fourier transform of , is the Fourier transform of , and .

    Differentiating equation 3.4 with respect to and substituting equation 3.5 into the result

    gives

    (3.6)

    Let . Let be a diagonalizing matrix for , i.e., is the diag-

    onal matrix whose diagonal elements are the eigenvalues of . Rewriting equation 3.6

    with yields:

    (3.7)

    Let and , we get

    (3.8)

    This differential equation has the general solution

    (3.9)

    16

  • 7/29/2019 10.1.1.186.4635

    26/91

    For a bus with non-zero resistive and capacitive or inductive components, the elements of

    and are complex numbers. Combining equation 3.9 with the definition of yields:

    (3.10)

    Assuming all source ends are terminated with an impedance of and the load ends are

    left open, we have the following boundary conditions.

    length (3.11)

    (3.12)

    Combined with equation 3.4 and 3.10, the first boundary condition given above yields:

    length (3.13)

    From equation 3.10, we know that:

    (3.14)

    Thus, equation 3.12 yields:

    (3.15)

    Equations 3.13, 3.15 yield the final solution

    (3.16)

    with

    length length

    length

    (3.17)

    where is the identity matrix. Note that , and . Thus, the

    frequency response of the bus is:

    length length (3.18)

    17

  • 7/29/2019 10.1.1.186.4635

    27/91

    with defined in equation 3.17. The inverse Fourier transform yields the impulse

    response of the bus which is used extensively in the next chapter. Note that the frequency

    response of the bus is a square matrix at each frequency. The impulse response of the bus

    is also a square matrix at each time sample. Entry at time denotes the response on

    wire at time given an impulse input on wire at time .

    3.2 Bus parameters and Simulation results

    I validated the model derived above by comparing its prediction with Spice simulations.

    Figure 3.1a shows the solution of equation 3.16 using Matlab and figure 3.1b shows spice

    simulation results. The parameters used in both simulation are: bus width = 3, length =

    5 cm, = 0.066 ohm/cm, = 0.8 pF/cm, = 3.99 nH/cm, = 0.31, = 0.23,

    = 5.0 V, bit time = 500 ps, = 10% *bit time = 50 ps. These parameters correspond to

    microstrip lines 34.5 m thick (1 oz copper), 75 m wide with 75 m separation between

    lines, running above a ground plane with a dielectric thickness of 100 m, and a dielectric

    constant of 4.5. The bus parameters are computed using formulas given in [10].

    18

  • 7/29/2019 10.1.1.186.4635

    28/91

    Figure 3.1: Analytical solution from equation 3.16 (upper panel) vs. Spice simulation

    results (lower panel) of 3-bit bus. All lines are quiet except line 1.

    19

  • 7/29/2019 10.1.1.186.4635

    29/91

    Chapter 4

    Linear Equalizing Filter Design

    In this chapter, I present techniques for the design of linear equalizing filters. I first in-

    troduce the idea of a data eye and its use to quantify filter performance. The next section

    defines notations that simplify the mathematical presentation of linear equalizing filter de-

    sign. Then, I introduce the least squares (LSQ) method and the linear programming (LP)

    method, followed by test results. Finally, based on the linear FIR filter designs, time-variant

    FIR filter design and optimal smoothing filter design are discussed.

    4.1 Measurements of filter performance

    The effects of distortion and noises are often illustrated using eye diagrams. An illustrative

    eye diagram is shown in figure 4.1. It is called eye diagram because of its shape. During

    sample interval, signal is either distinctly high or distinctly low. It must not go through the

    center of the eye. This allows the receiver to unambiguously determine the value of the

    bit that was transmitted. The signal can change between sampling intervals. I also restrict

    how high (or low) the signal may go, otherwise, with scaling any eye opening can be made

    20

  • 7/29/2019 10.1.1.186.4635

    30/91

    eye width, w

    target

    v(t)

    Bad

    Good

    Good

    Bad

    low

    targethigh

    SampleInterval

    Bad

    Bad

    IntervalSampleNext

    hunder

    overh

    t

    Figure 4.1: An illustrative eye diagram.

    arbitrarily large. Eye height heightis defined as

    height under target over (4.1)

    where under and over are defined in figure 4.1. The eye height and width are often used as

    an indication of signal integrity. Figure 4.2 shows how a data eye is formed by overlaying

    a signal waveform over multiple cycles.

    The eye width, in figure 4.1, is the time that the separation between high-going

    and low-going signals is greater than zero. In practice, the receiver will attempt to sample

    the signal near the moment of the widest eye opening. Due to uncertainties in the timing of

    the transmitter and receiver and in the delay of the interconnect, the actual sampling may

    occur at some time other than this ideal. The eye-width gives an indication of the robustness

    of the interface to these timing uncertainties.

    In this thesis, the effectiveness of a filter is quantified in the three following ways:

    eye height of the output signal given a pseudo-random input sequence.

    21

  • 7/29/2019 10.1.1.186.4635

    31/91

    Figure 4.2: Example of a data eye. Upper panel shows a random signal. Its corresponding

    eye diagram is shown in the lower panel.

    22

  • 7/29/2019 10.1.1.186.4635

    32/91

    eye height of the output signal given the worst-case input sequence.

    the smallest bit time (or highest bit rate) at which the eye height of output signals

    is greater than a specified amount, e.g. 50% of the nominal signal level and the eye

    width is greater than another specified amount, e.g. 25% of the bit time.

    4.2 Preliminaries

    By defining some notation up-front, the presentation of the filter design methods can be

    more succinct and direct. The responses of filters and buses are naturally written as con-

    volutions while linear and least squares problems are naturally formulated with matrices.

    Here I define some notation to show the connection between various convolutions and their

    corresponding matrix representations.

    Let be a vector of size . The components are . Ill write

    to denote the size of , to denote the norm of , and to denote the norm

    of .

    Some matrix abbreviations used below are:

    The identity matrix

    The matrix of zeros

    The matrix where

    (4.2)

    4.2.1 Convolution

    Linear Convolution: Let and be two vectors. The linear convolution of and is

    the vector of size defined below:

    (4.3)

    Linear convolution is commutative and associative.

    23

  • 7/29/2019 10.1.1.186.4635

    33/91

    Circular Convolution: Let and be two vectors in . Let denote the

    circular convolution of and :

    (4.4)

    Circular convolution is commutative and associative.

    Let be a vector and be an integer with . The zero-extension of pads

    with zero elements to produce a vector of size :

    extend (4.5)

    Zero-extension is a linear operator:

    extend (4.6)

    Let extend be the left matrix on the right hand side of the equation.

    Linear convolution can be expressed as circular convolution of zero-extended vec-

    tors:

    extend extend (4.7)

    Block Linear Convolution: Let be a matrix. We can think of as a column

    of matrices:

    ...

    (4.8)

    where each of the is a matrix. The block linear convolution of and

    is defined similarly as linear convolution:

    (4.9)

    24

  • 7/29/2019 10.1.1.186.4635

    34/91

    The block linear convolution of matrix and vector is defined simi-

    larly.

    Block Circular Convolution: The block circular convolution of and , where

    :

    (4.10)

    Block circular convolution is associative. It is commutative if the product of the sub-

    matrices is commutative, for example, if the sub-matrices are all symmetric or all circulant

    (circulant matrices are defined in sec 4.2.2 below). Extending the extend operator to block

    matrices, let be a matrix, and let .

    extend (4.11)

    Zero extension on block matrices is a linear operator just as it is for vectors.

    Block linear convolution can be expressed as block circular convolution of zero-

    extended matrices:

    extend extend (4.12)

    where .

    4.2.2 Matrix Representations of Convolution

    In this section, I will first present matrix representations for linear convolution, then extend

    it to block linear convolution.

    Let be a vector. Let be the circulant matrix [5] generated by

    :

    (4.13)

    25

  • 7/29/2019 10.1.1.186.4635

    35/91

    The form of this circulant matrix is depicted below:

    ......

    ......

    (4.14)

    Let and be two vectors of the same size. Equations 4.4 and 4.13 yield:

    (4.15)

    Furthermore, if , , . . . , are all vectors of the same size, then

    (4.16)

    Note that matrix multiplication of circulant matrices is commutative and associative, just

    like the corresponding convolution.

    Let be a matrix, and let row be the vector such that

    row (4.17)

    Likewise, let col be the vector such that

    col (4.18)

    Convolution can be expressed with all arguments represented as matrices:

    col

    (4.19)

    Using equation 4.7, linear convolution can be expressed using matrix multiplication:

    extend extend (4.20)

    26

  • 7/29/2019 10.1.1.186.4635

    36/91

    Define as the matrix given by

    extend (4.21)

    The form of this matrix is depicted below:

    v(0)

    v(1)

    v(m1)

    v(m1)

    v(m1)

    v(m1)

    v(m1)

    v(0)

    v(0)

    0

    0(4.22)

    where . The linear convolution of can be written as

    col

    (4.23)

    where

    (4.24)

    The matrix representation for linear convolution described above can be extended

    to block linear convolution. Let be a matrix. As described in the previous

    section, the matrix can be regarded as a column of submatrices of dimension

    each.

    The block circulant matrix generated by is

    ......

    ...

    (4.25)

    27

  • 7/29/2019 10.1.1.186.4635

    37/91

    For those who prefer formulas to ellipses:

    div div(4.26)

    Let be matrices. Equations 4.10 and 4.25 yield:

    (4.27)

    Using equation 4.12, block linear convolution of and

    can be expressed using matrix multiplication:

    extend extend

    extend

    col

    (4.28)

    where is defined as extend , , and col is

    defined in the obvious manner.

    Block linear convolution of with can also be expressed as

    matrix multiplication:

    extend (4.29)

    where .

    Define the following operators:

    block creates circulant blocks from vector .

    div (4.30)

    The form of this matrix is depicted below:

    ...

    (4.31)

    28

  • 7/29/2019 10.1.1.186.4635

    38/91

    vec2cir converts a vector to a circulant matrix:

    vec2cir extend block (4.32)

    Define as the matrix given by vec2cir . This matrix has the

    same form as (see equation 4.14), except that now each block is a circulant

    matrix of size . Notice that is a block circulant matrix and

    extend col

    With these operators, it is straightforward to see that for and ,

    col (4.33)

    where .

    The block linear convolution of two vectors and is defined

    as:

    col (4.34)

    where . The block linear convolution of

    can be written as

    col

    (4.35)

    where

    (4.36)

    29

  • 7/29/2019 10.1.1.186.4635

    39/91

    4.3 Least Squares Optimization Method with Pseudo-random

    Input

    4.3.1 Motivation

    As discussed in the previous section, an ideal bus would in all cases deliver the near end

    signal without distortion to the far end receiver, with some amount of delay. Thus, we

    know that in the ideal case, the expected output signal would be simply a delayed version

    of the input signal. The goal of filter design is to find a set of filter coefficients that make

    the output signal as close to this ideal output signal as possible. Following the example of

    [6][7], I use RMS error ( metric) in this section as a measure of the distance of the filter

    output from the ideal, delayed signal. In this case, filter design can be formalized as a least

    square optimization problem. In section 4.4, I use worst-case difference between a signal

    and the target as a measure of distance ( metric) and show that the resulting filter design

    problem is an instance of linear programming.

    4.3.2 Least Square problem formulation

    Input

    Consider a bus with bus wires. Let input denote the length of the input training

    sequence in bit times. Thus, an input is a function that gives a value, +1 or -1, for

    each wire bus at each time input . This function can be

    represented by a vector, input, with input bus denoting the value of the

    wire at time . Because filter coefficients are given in tap times, oversampling is

    needed to convert the input from a sequence in bit times to a sequence in tap times.

    Let input input bus be a vector and be a positive integer. The oversample op-

    erator, oversample input bus computes in input bus which is the times

    30

  • 7/29/2019 10.1.1.186.4635

    40/91

    oversample of input:

    in input bus div bus bus

    The oversample operator is linear. In particular, oversample input bus is a ma-

    trix with:

    oversample input bus

    if div bus div

    and bus bus

    otherwise

    Thus,

    oversample input bus oversample input bus input

    Define input as the vector given by oversample input bus .

    The form of this vector is depicted below:

    input

    input

    input

    ...

    input

    Repeats bus more times

    input

    input

    ...

    input

    Repeats bus more times

    ...

    input

    (4.37)

    31

  • 7/29/2019 10.1.1.186.4635

    41/91

    Buses

    In much the same manner as above, the impulse response of a bus with bus wires is

    a column of bus bus matrices with each matrix giving the response corresponding

    to a particular delay. Let bus denote the length of the bus impulse response in tap

    times. The bus impulse response can be represented by a bus bus bus matrix, bus

    where bus bus out in is the response of the out output wire of the bus after

    a delay of tap times to the in input wire.

    Let in be the vector for the input of the bus in tap time.

    in input

    Let in denote the value of the input at tap time t:

    in in bus bus

    Likewise, let bus denote the bus impulse response at time :

    bus bus bus bus

    Let output be the vector for the output of the bus and let output

    be the output at tap time :

    output bus in (4.38)

    Equation 4.38 has the form of a block linear convolution. Thus,

    output col bus in (4.39)

    where bus input. Moreover, in this thesis, for simplicity, I assume that

    wires are arranged around a cylinder (see chapter 3). This means that all wires have

    the same characteristics, and bus is a circulant matrix. Let be the

    32

  • 7/29/2019 10.1.1.186.4635

    42/91

    vector whose bus bus components are the first column ofbus .

    That is,

    bus block

    bus

    Thus the ouput signal of the bus given input in bit time is:

    output col input (4.40)

    where bus input.

    Filter

    In figure 1.1, a filter is depicted for each wire of the bus. Because all wires have

    the same characteristics, I assume that every filter is the same. For a bus-bit bus,

    this filter system can also be viewed as a bus bus input/output network. Thus,

    similar to the bus, the input/output relationship of the filter system with fir taps can

    be expressed as:

    filterOutput col input (4.41)

    where is the filter coefficient vector of the filter for wire 0 of the bus. It

    33

  • 7/29/2019 10.1.1.186.4635

    43/91

    has the form depicted below:

    fir

    (4.42)

    where denotes the contribution of the input on wire 0 at time 0 to the filter output

    for wire at time .

    Because the bus is symmetric, I restrict my attention to symmetric filters. That is, in

    the vector depicted above,

    for fir bus

    Moreover, inputs on wires far away produce very little cross-talk. Therefore, it may

    be practical to force the filter coefficients for these wires to zero to simplify the im-

    plementation of the filter. In this thesis, filters with various sizes are investigated.

    Filter size is defined as filter length filter width. A fir fir filter contains fir sets

    of fir filter coefficients for inputs on wire itself and the fir nearest wires in both

    34

  • 7/29/2019 10.1.1.186.4635

    44/91

    direction. Its filter coefficient vector fir is depicted below:

    fir

    fir

    fir

    ...

    fir

    fir

    ...

    fir fir

    Define filterExtend fir fir bus as the matrix depicted below:

    (4.43)

    where denotes the horizontal concatenation of a column vector

    with a matrix . Operator filterExtend fir fir bus transforms fir to the full

    filter coefficient vector in equation 4.42.

    filterExtend fir fir bus filterExtend fir fir bus fir (4.44)

    35

  • 7/29/2019 10.1.1.186.4635

    45/91

    Denote fir as the vector given by filterExtend fir fir bus , which equals the full

    filter coefficient vector depicted previously. Thus, with equation 4.41, the output

    signal of the filter system with fir fir filters can be expressed as:

    filterOutput col fir input (4.45)

    where fir input.

    Target signal

    Let be the target signal which is a delayed version of the input signal.

    I considered two ways to approximate the expected delay.

    LC delay: length .

    approximate the delay by determining the peak of the Frobenius norm of the

    bus impulse response.

    The second one is more accurate because the effect of resistance is also taken into

    account, especially for long buses where RC delay dominates LC delay. In this thesis,

    all results are obtained with the second method.

    Let be the following matrix:

    if

    otherwise

    where is the approximated delay in tap time and

    input

    The target signal is given by

    input (4.46)

    36

  • 7/29/2019 10.1.1.186.4635

    46/91

    Output signal

    With above analysis, it is straightforward to express the output signal of the system

    with fir fir filters in figure 1.1 using matrix multiplication. Let the vector output

    represent the output of the system in tap time:

    output col h fir input

    col h input fir

    h input extend fir bus filterExtend fir fir bus fir

    (4.47)

    where input fir bus.

    Let

    h input extend fir bus filterExtend fir fir bus

    (4.48)

    Then

    output fir (4.49)

    Least Squares Problem

    With equations 4.46 and 4.49, the least squares problem is:

    firfir

    Given and , I used QR decomposition (i.e. the backslash command in Matlab) to

    find the vector fir that minimizes the least square error of the over-determined system

    fir .

    4.3.3 An example

    To show the effectiveness of the equalizing filter approach in cross-talk cancellation, con-

    sider a 32-bit bus with length 5 cm. The electrical parameters of the bus are: = 0.066

    37

  • 7/29/2019 10.1.1.186.4635

    47/91

    /cm, = 0.8 pF/cm, = 3.99 nH/cm, = 0.31, = 0.23. Filter design parameters

    are: fir , fir , taps per bit = 4, bit time = 400 ps, length of the training sequence,

    input bits.

    For this particular example, in equation 4.46 and 4.49,

    Bus width bus = 32. Length of the bus impulse response, bus is set to be 16 taps.

    Thus, h is a vector of length: bus bus .

    input is a vector of length: bus input .

    fir .

    is a matrix of size .

    Pseudo-random input sequences are used as test sequence. By comparing the eye

    opening with and without the filter designed, we get an indication of the effectiveness of

    the filter.

    Figure 4.3 shows the predistorted input signal on wire 1. The waveforms in fig-

    ure 4.4 clearly show that the filter greatly reduces the overshoot and undershoot,

    which is also shown by the eye diagrams in figure 4.5. With the FIR filter designed,

    the eye height is increased from 31% to 82%. This tells us that equalizing filters are a very

    promising method in cross-talk cancellation for high speed buses. More thorough testing

    results are presented in section 4.5.

    4.4 Linear Programming Method with Worst-case Input

    In this section, a linear programming method is introduced with the assumption that we can

    solve the formulated linear programming problem.

    38

  • 7/29/2019 10.1.1.186.4635

    48/91

    0 10 20 30 40 50 602

    1.5

    1

    0.5

    0

    0.5

    1

    1.5

    2

    Voltage

    (v)

    t (ns)

    Input signal on wire 1Predistorted signal on wire 1

    Figure 4.3: Predistorted signal: equalizing filter output

    4.4.1 Motivation

    Although the least squares optimization method works and the FIR filter described greatly

    improves the eye height of signals transmitted, this method has several shortcomings.

    The filter designed by the LSQ optimization method greatly depends upon the pseudo-

    random input pattern used as training sequence. To get a good filter design, a long

    training sequence must be used, which makes the speed of the filter design very slow

    for wide buses as occur frequently in practice.

    The design objective is to transmit the bits without error. It is assumed that as long as

    a bit satisfies the eye specification, it will be received correctly. Thus, getting some

    bits that already satisfy the eye specification closer to the target signal doesnt matter.

    Its the worst-case pattern that determines the eye height. Thus, the metric doesnt

    39

  • 7/29/2019 10.1.1.186.4635

    49/91

    0 10 20 30 40 50 602

    1.5

    1

    0.5

    0

    0.5

    1

    1.5

    2

    t (ns)

    Voltage

    (v)

    System without filtersInput signal on wire 1

    Output signal on wire 1

    0 10 20 30 40 50 601.5

    1

    0.5

    0

    0.5

    1

    1.5

    Voltage

    (v)

    t (ns)

    System with 8*2 filtersInput signal on wire 1Output signal on wire 1

    Figure 4.4: Examples of output signal for 32-bit interconnect network with (lower panel)

    and without (upper panel) 8 2 equalizing filters designed with the LSQ method.

    40

  • 7/29/2019 10.1.1.186.4635

    50/91

    0 100 200 300 400 500 600 700 8002

    1.5

    1

    0.5

    0

    0.5

    1

    1.5

    2Eye diagram for system without filters (eye height 29%, eye width 75%)

    t (ps)

    Voltage

    (v)

    0 100 200 300 400 500 600 700 800

    1.5

    1

    0.5

    0

    0.5

    1

    1.5

    t (ps)

    Voltage

    (v)

    Eye diagram for system with 8*2 filters (eye height 82%, eye width 75%)

    Figure 4.5: Eye-diagrams for a 32-bit interconnect network with (lower panel) and without

    (upper panel) 8 2 equalizing filters designed with the LSQ method. Red traces indicate

    high signal transmitted. Blue traces indicate low signal transmitted.

    41

  • 7/29/2019 10.1.1.186.4635

    51/91

    strictly correspond to eye height, the metric defined in equation 4.1. For example, it

    is possible that for a training sequence, some filter coefficient set produces very small

    RMS but the output signal has 1 bad trace. It is that 1 bad trace which determines

    the eye height. Certainly, we can reformulate the same problem into a linear pro-

    gramming problem, such that for a given training sequence (pseudo-random input),

    the metric is minimized. However, in order to guarantee worst-case performance,

    ideally, all possible input combinations should be part of the training sequence. This

    is obviously not practical.

    It turns out that for a given set of filter coefficients, the worst-case input pattern can

    be figured out and thus the worst-case eye height can be computed. This section is

    devoted to this method that minimizes over all possible inputs.

    First, I show that the search space for the metric is convex even when more general,

    non-linear filters are considered. Formalize an eye height specification as a sequence of

    tuples:

    A filter satisfies if and only if for every input every output satisfies:

    output input

    output input

    (4.50)

    where is the number of taps per bit, is the expected delay of the bus, output

    h input and is the filter function. Let denote filter satisfies eye .

    Let and be two filters that satisfy some eye opening constraint . That is,

    and . Let , where . Because the bus, h , is linear,

    a system with produces output signals that are the same linear combination of what is

    42

  • 7/29/2019 10.1.1.186.4635

    52/91

    produced by systems with and . It then follows from equation 4.50 that .

    Thus the space of filters that satisfy eye opening constraint E is convex.

    The objective is to send -1, 1 signals down the bus as clearly as possible. In this

    system, every wire of the bus has the same configuration and thus is interchangeable. Thus,

    the original objective is the same as trying to send down 1 on wire 1 as clearly as possible

    with the worst-case disturbances from other wires and preceding and following bits.

    The output signal on wire 1 for the current bit is simply a summation of the effect

    on wire 1 at the current bit from

    the input signal on wire 1 for the current bit, which is the signal expected to come

    through if there is no disturbance.

    the input signal on wire 1 at other bit times and also the input signals on other wires

    for the current bit and other bits, which produce disturbances on the first wire at the

    current bit.

    Thus, the optimization problem can be restated as the following: Given that the

    current bit input on the first wire is 1, find the best set of filter coefficients that makes the

    output signal on wire 1 at the current bit as close to 1 as possible for the worst-case input

    sequence which produces the largest disturbances on wire 1 from other bit times and other

    wires.

    subject to

    undisturbed disturbances

    undisturbed disturbances

    (4.51)

    43

  • 7/29/2019 10.1.1.186.4635

    53/91

    4.4.2 Linear Programming Problem formulation

    I now focus on the practical case where the filter is linear and FIR, and show that the design

    problem is an instance of linear programming. The goal remains to send down 1 along

    wire 1 as clearly as possible. A quantified version of this goal is: for the worst case input

    sequence with 1 at the current bit, the output signal at some given sampling time is as close

    to 1 as possible. That is, at this sampling point, the eye height is as high as possible. A

    reasonable sampling point is , the delay of the bus. Equation 4.51 shows that to formulate

    the LP problem for the equalizing filter design, we need to know the undisturbed output at

    the sampling point and the largest total disturbances at the sampling point.

    Let in be the input sequence that is 1 bit long and only the bit on the first wire is 1:

    inif

    otherwise

    (4.52)

    Because the whole system is linear and circulant, the response to this pulse input in gives

    us all the information we need to compute the output for the worst-case disturbances. From

    section 4.3, for the system with fir fir FIR filters, we know that the corresponding output

    is given by G fir, where G as given by equation 4.48:

    G h in extend fir bus filterExtend fir fir bus

    where fir bus is the length of the response in tap time. Different rows ofG fir

    represent the response on some wire at some tap time. The contribution of the bit from

    equation 4.52 to the output at the sampling time is:

    undisturbed row bus G fir (4.53)

    Responses from other wires and responses from the first wire arising from earlier and later

    bits are the disturbances. For example, the disturbance on wire 1 at the sampling time

    44

  • 7/29/2019 10.1.1.186.4635

    54/91

    caused by input on the second wire bit times earlier, is the same as the disturbance on wire

    2 at the sampling time from the input on the first wire bit times earlier. Moreover, it is the

    same as the response of the original pulse input from equation 4.52 observed on wire 2 at

    the tap time that is bit times later than the sampling time. These are due to the linearity

    and symmetry of the system. Thus,

    disturbance row bus G fir (4.54)

    where disturbance is the disturbance on the first wire at the sampling time given an

    input of 1 on the wire bit times earlier. If the disturbances from other bit times and

    other wires are all positive, we get the largest total disturbance and hence the worst-case

    disturbances. Let d denote the worst-case, positive disturbance on wire 1 at the sam-

    pling time from the input on wire bit times earlier. Noting that each input to the filter

    is either +1 or -1, the following inequality constraints compute the absolute value function

    needed to obtain d :

    d rowbus

    G fir

    d row bus G fir(4.55)

    Because the cost function is positive monotonic in each of the d , either the first con-

    straint or the second constraint is tight at the optimal point.

    Let be the matrix that contains all the rows in G that matter. The total number of

    rows are . To calculate the total disturbances from other bit times and wires, ideally,

    an infinitely long history should be considered because of the infinite impulse response

    of the bus. This is not practical. Notice that most of the energy of the impulse response

    expands over about 6 times LC delay of the bus (see figure 4.6). For the particular bus

    model presented in this thesis, the LC delay is about 250 ps. All the results presented here

    45

  • 7/29/2019 10.1.1.186.4635

    55/91

    0 200 400 600 800 1000 1200 1400 16000

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    t (ps)

    Frobeniu

    snormo

    ftheimpulseresponseof

    thebus peak

    future history

    Figure 4.6: Frobenius norm of the bus impulse response.

    are obtained with:

    bus length (tap time)

    Moreover, notice that the bus impulse response does not rise immediately to the peak. This

    means that not only the history bits affect the output of the current bit but also a few future

    bits. Among the rows, there are future bits, 1 current bit and the rest are history

    bits. For bus

    row bus row bus G (4.56)

    Thus rowbus

    fir gives the undisturbed output. Here is the equalizing filter

    design problem as a linear programming problem in fir d :

    46

  • 7/29/2019 10.1.1.186.4635

    56/91

    row bus

    row bus

    fir

    d

    (4.57)

    4.4.3 Smoothing filter

    It was found that if we average a few taps of the current output bit and use that as the

    objective function of the LP problem, the eye height obtained is better than simply asking

    the optimizer to bring one tap of the current output bit as close to 1 as possible. However,

    a corresponding smoothing filter is needed at the receiver end in order to get the desired

    output signal. Fortunately, such averaging behavior is typical of the input circuits on real

    chips [10]. The new system structure is shown in figure 4.7. Smoothing filters will be

    further examined in section 4.7.

    Assuming we are averaging over 3 taps (a more sophisticated strategy will be dis-

    cussed in section 4.7), define a smoothing operator:

    smooth

    ......

    ...

    (4.58)

    47

  • 7/29/2019 10.1.1.186.4635

    57/91

    Transmitter Equalizingfilter

    BUS Smoothingfilter

    Receiver

    Figure 4.7: System with smoothing filter at the receiver end.

    With the smoothing operator, now

    G smooth h in extend fir bus filterExtend fir fir bus

    (4.59)

    Moreover, the delay of the bus might not be the best sampling point. It was found

    that the best sampling point depended on the filter size and the bit time. For example, at

    300 ps bit time, equalizing filters designed with 1 tap extra delay in addition to the

    bus delay give the highest eye height (81%) among all filters. It is also better than the

    system without the smoothing filter (74%). In the rest of this thesis, all testing results are

    obtained with the extra delay varied to give the best eye height.

    4.4.4 An example

    The following example shows the effectiveness of the equalizing filter approach (LP method)

    in cross-talk cancellation. I use the same bus parameters and filter size as the example given

    in section 4.3.3. A pseudo-random test sequence is used.

    For this particular example, the LP problem formulated has the following properties:

    fir .

    number of disturbance variables, d : 223. Thus, total number of variables is 240.

    48

  • 7/29/2019 10.1.1.186.4635

    58/91

    number of constraints: 448.

    From figures 4.5 and 4.9, note that the eye-height for the filter designed by the LP

    method ( norm) is slightly higher than that for the LSQ filter ( norm), vs. .

    As expected, optimizing for eye-height produces greater actual eye-height than the average

    case optimization of the LSQ method. The eye width for LP is significantly smaller than

    that for LSQ, vs. . This is expected because the LP filter is optimized for eye-

    height at a specific sampling point, whereas the LSQ objective function considers the entire

    waveform. Section 4.5 presents further comparisons.

    The speed of FIR filter design with the LP method largely depends on the size of the

    LP problem formulated. Thus it depends on how many bits (number of disturbances) are

    used to design the filter and the size of the filter. The number of disturbances is determined

    by the length of the bus impulse response in bit time. The smaller the bit time, the larger the

    LP problem. For an filter design at 400 ps, on a Linux box with a 800MHz Pentium

    III CPU and 256MB memory, it finishes within a few seconds. Based on this method, I

    investigated other variations of linear FIR filters, such as time-variant linear FIR filters and

    other types of smoothing filters.

    4.5 Testing results: Comparison of LSQ method and LP method

    4.5.1 Worst-case input sequence

    In section 4.3.3 and 4.4.4, pseudo-random input sequences were used to measure the eye

    opening (eye height and eye width). By comparing the eye opening with and without the

    filter designed, we get an indication of the effectiveness of the filter. A shortcoming of

    using pseudo-random input sequences as testing sequence is the result varies a lot from

    time to time if the input sequence is not long enough. But simulation with a very long input

    49

  • 7/29/2019 10.1.1.186.4635

    59/91

    0 10 20 30 40 50 602

    1.5

    1

    0.5

    0

    0.5

    1

    1.5

    2

    t (ns)

    Voltage

    (v)

    System without filtersInput signal on wire 1

    Output signal on wire 1

    0 10 20 30 40 50 601.5

    1

    0.5

    0

    0.5

    1

    1.5

    t (ns)

    Voltage

    (v)

    System with 8*2 filters Input signal on wire 1Output signal on wire 1

    Figure 4.8: Example of output signals for systems with (lower panel) and without (upper

    panel) the equalizing filter designed with the LP method.

    50

  • 7/29/2019 10.1.1.186.4635

    60/91

    0 100 200 300 400 500 600 700 8002

    1.5

    1

    0.5

    0

    0.5

    1

    1.5

    2Eye diagram for system without filters (eye height 29%, eye width 75%)

    t (ps)

    Voltage

    (v)

    0 100 200 300 400 500 600 700 800

    1.5

    1

    0.5

    0

    0.5

    1

    1.5

    t (ps)

    Voltage

    (v)

    Eye diagram for system with 8*2 filters (eye height 84%, eye width 50%)

    Figure 4.9: Pseudo-random test: eye diagrams for systems with (lower panel) and without

    (upper panel) the equalizing filter designed with the LP method. Red traces indicate high

    signal transmitted. Blue traces indicate low signal transmitted.

    51

  • 7/29/2019 10.1.1.186.4635

    61/91

    sequence takes a long time. Inspired by the LP filter design procedure, I used the worst-

    case input sequence for each filter instead of pseudo-random input sequence as the testing

    sequence. Since the total disturbance from other bit times and other wires is the largest for

    the worst-case input, the eye opening is the smallest among all input sequences and hence

    the most representative.

    For a given set of filter coefficients, the worst-case input sequence input with length

    (where is defined in equation 4.48, is the number of taps per bit) can be found

    by:

    for every wire and every bit , calculate the resulting disturbance on wire 1 at a

    given sampling time. If the filter coefficients are obtained with the LP method, the

    sampling time is the same as what was used in the LP filter design. For an arbitrary

    set of filter coefficients, the sample point is not defined in advance. Instead, I consider

    every tap time as a possible sample point and select the one with the best eye height

    as the sample point. Accordingly, the worst-case input sequence is determined by

    finding the worst-case input for each possible sampling time and concatenating these

    sequences together.

    input = 1. We are looking for the largest negative disturbances

    when 1 is sent. Negation of this input sequence is also a worst-case input sequence.

    If disturbance , input . Otherwise input , for

    bus , .

    In this section, all testing results are obtained with input sequences that are con-

    catenations of the worst-case input sequence and pseudo-random input sequences, unless

    otherwise indicated.

    52

  • 7/29/2019 10.1.1.186.4635

    62/91

    0 100 200 300 400 500 6002

    1.5

    1

    0.5

    0

    0.5

    1

    1.5

    2

    Voltage

    (v)

    t (ps)

    Eye diagram for system with 8*3 filters (eye height 80%, eye width 50%)

    0 100 200 300 400 500 600

    2

    1.5

    1

    0.5

    0

    0.5

    1

    1.5

    2

    t (ps)

    Voltage

    (v)

    Eye diagram for system with 8*3 filters (eye height 92%, eye width 50%)

    Figure 4.10: Worst-case test (upper panel) vs. Pseudo-random test (lower panel): eye dia-

    grams for systems with equalizing filters designed with the LP method. Red traces

    indicate high signal transmitted. Blue traces indicate low signal transmitted.

    53

  • 7/29/2019 10.1.1.186.4635

    63/91

    0.0%

    20.0%

    40.0%

    60.0%

    80.0%

    100.0%

    0 5 10 15 20

    Filter Width

    Eye

    height

    4 taps

    8 taps

    12 tap

    16 taps

    Filter Width

    Eyeheight

    Figure 4.11: Worst-case performance of different equalizing filters designed with the LP

    method. Only equalizing filters with eye width greater than 25% are shown. Simulation

    parameters are: /cm, pF/cm, nH/cm, = 0.31, =

    0.23. Filter design parameters are: taps per bit = 4, bit time = 300 ps.

    Figure 4.10 upper panel shows an eye diagram obtained with such a testing se-

    quence. Comparing with the eye diagram shown on the lower panel, it is noticed that

    the worst-case input sequence happens rarely and there is a significant difference between

    worst-case eye height 80% and random eye height 92%. This gives a possibility that if a

    certain amount of bit error is tolerated by using some error correcting code strategy, the

    maximum bit rate could be further improved.

    4.5.2 Indirect coupling

    For simplicity in the distributed coupled RLC model, I only considered capacitive and in-

    ductive coupling between adjacent lines. Thus, originally I thought that it should be enough

    to equalize one line when only information on this line itself and its two nearest neighbours

    54

  • 7/29/2019 10.1.1.186.4635

    64/91

    Figure 4.12: Indirect coupling between non-adjacent lines

    are used by the filter design. For a three-bit interconnect, this method considers all wires

    and therefore does give the optimal result. However, for a bus with more than three lines,

    if we consider more lines instead of just adjacent lines (increase the filter width), better

    cross-talk cancellation can be achieved. Figure 4.11 shows the performance of filters with

    different sizes at 300 ps bit time. Compared with an filter, the filter has a much

    greater eye opening. This indicates that although the direct coupling between non-adjacent

    lines is weak and ignored in the interconnect network model, the indirect coupling between

    non-adjacent lines is strong and shouldnt be ignored in the equalizing filter design. It is

    also shown in figure 4.11 that for the bus considered this trend nears its asymptote when

    filter width is larger than 4.

    The indirect coupling between non-adjacent lines is illustrated in figure 4.12. From

    figure 4.12, a pattern of transfer function in the frequency domain was conjectured. That is,

    If this pattern existed, a simple filter considering all lines could be designed. Unfortunately,

    since we are considering far end noise cancellation, which is not only a function of input

    voltage but also a function of distance , this speculated pattern does not occur in practice.

    55

  • 7/29/2019 10.1.1.186.4635

    65/91

    0 100 200 300 400 500 60015

    10

    5

    0

    5

    10

    15

    Voltage

    (V)

    t (ps)

    eye diagram for system with 8*16 filters (eye height 99%, eye width 0%)

    Figure 4.13: Eye diagram for system with equalizing filters designed by the LP

    method. Red traces indicate high signal transmitted. Blue traces indicate low signal trans-

    mitted.

    4.5.3 Over-fitting

    Compared with the LSQ method, the LP method is fast and guarantees worst-case per-

    formance. However, it has an over-fitting problem. Figure 4.14 shows the trend that the

    magnitude of overshoot increases with the size of equalizing filter designed. It suggests

    that with more degrees of freedom, the optimizer tends to put more energy into the filter in

    order to get higher eye height at the sampling time, which results in much greater overshoot.

    For some inputs, the output signal changes abruptly but close to the target right at the sam-

    pling time, resulting in a larger eye height but also a much smaller eye width. This effect

    is clearly shown in figure 4.13, which shows an eye diagram obtained with filter

    designed with the LP method. The eye height of the diagram is the largest among all filters

    with length 8, but it barely has an eye opening, and has an enormous amount of overshoot.

    56

  • 7/29/2019 10.1.1.186.4635

    66/91

    1

    10

    100

    1000

    10000

    100000

    1000000

    0 100 200 300

    Number of filter coefficients

    Overshoot(V)

    Overshoot(v)

    Number of filter coefficients

    1

    10

    100

    1000

    10000

    100000

    1000000

    0 5 10 15 20

    Filter width

    Ove

    rshoot(V) 4 taps

    8 taps

    12 taps

    16 taps

    Oversh

    oot(v)

    Filter Width

    Figure 4.14: Magnitude of overshoot increases with the size of the equalizing filter designed

    with the LP method. Upper panel shows this trend with the number of filter coefficients as

    axis. Lower panel shows this trend with the filter width as axis, given filter length. All

    simulations are done with the same parameters as in figure 4.11.

    57

  • 7/29/2019 10.1.1.186.4635

    67/91

    The lower panel of figure 4.14 shows that for a given filter length, magnitude of overshoot

    increases with the filter width. Whereas, for filter widths less than 6, the magnitude of the

    overshoot doesnt follow this trend with the filter length. Thus, the filter width plays a more

    critical role than filter length in the over-fitting problem. This could be explained by the fact

    that wires further away produce less disturbances on the wire than history bits on the wire

    itself. Thus with a longer filter, the optimizer could easily push the eye height up without

    putting more energy into the filter. When the filter length is long enough to cover most

    significant portion of the bus impulse response, this trend reaches its limit. As we can see

    from figure 4.11, at 300 ps bit time, filters with length 8 taps already do a very good job

    in cross-talk cancellation. Compared with 12 tap filters, 16 tap filters dont significantly

    improve the eye height.

    Moreover, for the bus considered, the improvement of eye height by increasing filter

    width also stops when the filter width is more than 4 for the bus considered (see 4.11). So,

    very long and wide filters wont bring more benefit, yet the filter becomes more and more

    complicated and expensive. In this sense, over-fitting problem of LP method may not be a

    serious problem in practice.

    Furthermore, instead of trying to bring 1 tap as close to 1 as possible, the LP method

    can be easily formulated to bring 2 taps as close to 1 as possible. By doing this, sharp tran-

    sitions are avoided hence the amount of overshoot is decreased. In practice, this method

    works in decreasing the severity of the over-fitting problem of the LP method when design-

    ing large filters.

    4.5.4 Minimum bit time

    To simplify design and yet achieve reasonable cross-talk cancellation, an important question

    is how many lines away should be considered when designing the equalizing filter. In other

    58

  • 7/29/2019 10.1.1.186.4635

    68/91

  • 7/29/2019 10.1.1.186.4635

    69/91

    Equalizing filters designed with both the LP method and the LSQ method effectively

    improve the maximum bit rate of the bus.

    Equalizing filters designed with the LP method have better performance than equal-

    izing filters designed with the LSQ method for every configuration considered. As

    discussed further below, the advantage for the LP method is most pronounced for

    wide filters.

    An filter is a good choice in terms of performance and cost. Although An

    filter does improve the eye height at lower bit rate (see figure 4.11), it has similar

    minimum bit time as the filter.

    Note that width = 1 is separate pre-emphasis for each line. With width = 1, LSQ and LP

    have similar performance. Because the focus of this work is on cross-talk cancellation,

    high-frequency attenuation caused by skin effect is not built in the bus model. Because

    of this, the performance of the system without filter (width = 0) and systems with pre-

    emphasis filters (width = 1) are similar (740 ps vs. 679 ps). With cross-talk cancellation

    (width ), the performance of the bus is greatly improved (2.7 to 3.4 times higher bit rate

    than independent pre-emphasis). With width , LP is significantly better than LSQ. This

    might not be a completely fair comparison because all the LP results were obtained with

    an additional smoothing filter at the receiver end whereas the LSQ results were obtained

    without any smoothing filters. The LSQ method can be easily applied to the system with

    a smoothing filter at the receiver end. However, the LSQ method with smoothing is no

    better than LSQ with increased number of taps. For example, 8 tap filters designed by

    the LSQ method with 3 tap smoothing filter is no better than 12 tap filters designed by

    the LSQ method without any smoothing filter. Table 4.1 shows that the performance of

    the LP method is better than this upper bound of the performance of the LSQ method with

    60

  • 7/29/2019 10.1.1.186.4635

    70/91

  • 7/29/2019 10.1.1.186.4635

    71/91

    4.6 Time-variant Linear FIR Filter

    A bit time consists of several tap times. In most examples presented previously, there are

    4 taps per bit. Among them, the first tap and the last tap are bit-transition taps. The other

    two are stable taps. The receiver samples stable taps and is insensitive to the input value at

    bit-transition taps. The FIR filter designed above has no information about bit transition. It

    treats every tap the same, no matter whether it is a bit-transition tap or a stable tap. What if

    we treat them differently and let the filter have the knowledge of bit transition? This leads

    to the design of time-variant linear FIR filter. The idea is to assign a set of filter coefficients

    for each tap per bit. For example, for a 8 3 FIR filter and 4 taps per bit, the time-variant

    linear FIR filter contains 4 sets of 8 3 filter coefficients. By doing this, the optimizer can

    differentiate transition taps from stable taps, and assign different filter coefficients to their

    corresponding filter. The way this time-variant filter works is illustrated in figure 4.15. In

    the figure, different sets of filter coefficients are indicated by different line style.

    This problem can be formulated into a linear programming problem similarly as the

    original simple linear FIR filter design. The only difference between these two linear pro-

    gramming problem formulation is the way that filter output is calculated. Let fir fir

    denote the set of filter coefficient vectors. The time-variant linear FIR filter coefficient

    vector:

    F

    fir

    fir

    ...

    fir

    (4.60)

    62

  • 7/29/2019 10.1.1.186.4635

    72/91

    FIR 2

    FIR 3

    FIR 4

    FIR 1

    Figure 4.15: The convolution procedure of the time-variant FIR filter. Different sets of filter

    coefficients are indicated by different line style.

    For a given input sequence input in bit time, the filter output is:

    filterOutput shuffle

    in

    in

    in

    in

    F (4.61)

    where in input with input fir bus , and shuffle

    is the following matrix:

    shuffle if div and div

    otherwise

    (4.62)

    The correctness of the time-variant FIR filter designed is checked by adding a set

    of equality constraints which specify that all four sets of filter coefficients are equal. After

    63

  • 7/29/2019 10.1.1.186.4635

    73/91

    adding the equality constraints, this design gives the same set of filter coefficients as the

    simple FIR filter designed in section 4.4, and the same objective value. So the time-variant

    FIR filter designed should be at least as good as the simple FIR filter designed in section

    4.4. For example, in the case of 4 taps per bit, and same design parameters as for figure 4.10,

    time-variant FIR filter has worst-case eye height 86% (vs. 80% for simple FIR

    filter). The improvement of the eye height tells us it does help to assign different filter

    coefficients to different taps. However the benefit might not be large enough to justify any

    extra cost in an implementation.

    4.7 Optimized Smoothing Filter

    The system structure shown in figure 4.7 naturally leads to the topic of optimized smoothing

    filter design. In previous sections, a smoothing filter which simply averages over 3 taps was

    used. Test results show that this is a good choice. However it is not optimal. For example, it

    is observed from those eye diagrams that weights assigned to those 3 taps shouldnt be the

    same. The first tap and last tap are closer to tap transition and should have smaller values.

    The middle tap contributes more to the eye height and should be assigned a larger weight.

    Moreover, there is no reason to limit the window size of the smoothing filter to only 3 taps.

    The system where the coefficients of both the equalizing filter and the smoothing

    filter are taken as variables is not linear because output values of the bus depend on the

    product of the coefficients of the two filters