fir filter features on fpga - diva portal

Linköpings universitetSE–581 83 Linköping

+46 13 28 10 00 , www.liu.se

Linköping University | Department of Electrical EngineeringBachelor thesis, 16 ECTS | Elektroteknik

FIR Filter Features on FPGAFIR Filter Finesser på FPGA

Ahmed Akif

Supervisor : Oscar GustafssonExaminer : Michael Josefsson

http://www.liu.se

Upphovsrätt

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare – under 25 årfrån publiceringsdatum under förutsättning att inga extraordinära omständigheter uppstår.Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstakakopior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och förundervisning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva dettatillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. Föratt garantera äktheten, säkerheten och tillgängligheten finns lösningar av teknisk och admin-istrativ art. Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman iden omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sättsamt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sam-manhang som är kränkande för upphovsmannens litterära eller konstnärliga anseende elleregenart. För ytterligare information om Linköping University Electronic Press se förlagetshemsida http://www.ep.liu.se/.

Copyright

The publishers will keep this document online on the Internet – or its possible replacement– for a period of 25 years starting from the date of publication barring exceptional circum-stances. The online availability of the document implies permanent permission for anyone toread, to download, or to print out single copies for his/hers own use and to use it unchangedfor non-commercial research and educational purpose. Subsequent transfers of copyrightcannot revoke this permission. All other uses of the document are conditional upon the con-sent of the copyright owner. The publisher has taken technical and administrative measuresto assure authenticity, security and accessibility. According to intellectual property law theauthor has the right to be mentioned when his/her work is accessed as described above andto be protected against infringement. For additional information about the Linköping Uni-versity Electronic Press and its procedures for publication and for assurance of documentintegrity, please refer to its www home page: http://www.ep.liu.se/.

c© Ahmed Akif

http://www.ep.liu.se/

http://www.ep.liu.se/

Abstract

Finite-length impulse response (FIR) filters are one of the most commonly used digital sig-nal processing algorithms used nowadays where a FPGA is the device used to implementit. The continued development of the FPGA device through the insertion of dedicatedblocks raised the need to study the advantages offered by different FPGA families. Thework presented in this thesis study the special features offered by FPGAs for FIR filtersand introduce a cost model of resource utilization. The used method consist of severalstages including reading, classification of features and generating coefficients. The resultsshow that FPGAs have common features but also specific differences in features as wellas resource utilization. It has been shown that there is misconception when dealing withFPGAs when it comes to FIR filter as compared to ASICs.

Sammanfattning

Finite-length impulse response (FIR) filter är en vanlig signalbehandling-algoritm därFPGA:n används för att implementera FIR filtret. En kontinuerlig utveckling av FPGA:ndär dedikerade block infördes skapade ett behov av att studera finesserna som erbjöds avolika FPGAs familjer. Det här arbetet presenterar olika finesser som är speciellt gjorda förFIR filter samt kostnads modell för resurserna. Metoden i det här arbetet består av olikadelar, bland annat att läsa datablad, klassificera finesserna och generera koefficienterna.Resultatet visar att FPGA:er har gemensamma finesser, men också avgörande skillnader.Det har visat sig att det finns missuppfattning när det kommer till hantering av FIR filterimplementering i FPGA:n jämfört med ASICs.

Acknowledgments

Firstly, I would like to express my special thanks to my supervisor Oscar Gustafsson for hisguidance that helped this work to see the light. I would also like to thank my examinerMichael Josefsson for his insightful tips that helped this report to be directed qualitativelyand professionally. I would also like to thank my family and friends for their support andencouragement.

Last but not least, I would like to thank myself for maintaining patience, persistence andcommitment to get this work done.

Linköping, June 2018Ahmed Akif

iv

Abbreviations

FPGA: Field Programmable Gate Array.

FIR: Finite-length Impulse Response.

DSP: Digital Signal Processing.

LUT: LookUp Table.

ASIC: Application Specific Integrated Circuit.

CLB: Configurable Logic Block.

IOB: Input/Output Block.

CLE: Configurable Logic Element.

SRAM: Static Random Access Memory.

LTI: Linear Time Invariant.

IIR: Infinite-length Impulse Response.

ACIN: A- Cascaded Input.

BCIN: B- Cascaded Input.

PCIN: P- Cascaded Input.

PCOUT: P- Cascaded Output.

SRL16: 1-16 Bit Shift Register Logic component.

KTH: Royal Institute of Technology, Stockholm.

IEEE: Institute of Electrical and Electronics Engineers.

v

Contents

Abstract iii

Acknowledgments iv

Abbreviations v

Contents vi

List of Figures viii

List of Tables x

1 Introduction 11.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Motivation and Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.4 Delimitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.5 Related Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Theory 42.1 Field-Programmable Gate Array (FPGA) . . . . . . . . . . . . . . . . . . . . . . . 4

2.1.1 Configurable Logic Block (CLB) . . . . . . . . . . . . . . . . . . . . . . . 52.1.2 Interconnections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.1.3 Input/Output Block (IOB) . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.1.4 DSP Block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 Digital Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.2.1 Finite-length Impulse Response (FIR) Filter . . . . . . . . . . . . . . . . . 8

2.2.1.1 FIR Filter Structures . . . . . . . . . . . . . . . . . . . . . . . . . 82.3 Filter Complexity Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.3.1 FIR Filter Symmetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.3.2 Input sharing of Multiple FIR filters . . . . . . . . . . . . . . . . . . . . . 112.3.3 Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.3.3.1 Exploiting Symmetry in Polyphase Interpolator FIR Filter . . . 12

3 Method 13

4 Results 144.1 Features of The DSP Slice Types . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

4.1.1 Virtex-5 FPGA DSP48E Slice . . . . . . . . . . . . . . . . . . . . . . . . . 144.1.2 The 7-Series FPGA DSP48E1 Slice . . . . . . . . . . . . . . . . . . . . . . 154.1.3 UltraScale FPGA DSP48E2 Slice . . . . . . . . . . . . . . . . . . . . . . . 16

4.2 FIR Filter Structures and Resource Utilization . . . . . . . . . . . . . . . . . . . 174.2.1 Pipelined Direct Form FIR Filter . . . . . . . . . . . . . . . . . . . . . . . 17

vi

4.2.2 Symmetric Pipelined Direct Form FIR Filter . . . . . . . . . . . . . . . . 174.2.3 Transposed Direct FIR Filter . . . . . . . . . . . . . . . . . . . . . . . . . . 194.2.4 Transposed Symmetric FIR Filter . . . . . . . . . . . . . . . . . . . . . . . 19

4.3 Generated Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194.4 Input Sharing of Multiple FIR Filters . . . . . . . . . . . . . . . . . . . . . . . . 20

4.4.1 Input Sharing of Pipelined Direct Form FIR Filter . . . . . . . . . . . . . 204.4.2 Input Sharing of Symmetric Pipelined Direct Form FIR Filter . . . . . . . 214.4.3 Input Sharing of Transposed FIR filter . . . . . . . . . . . . . . . . . . . . 224.4.4 Exploiting Symmetry in Polyphase Interpolator FIR Filter . . . . . . . . 23

4.5 Generated Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

5 Discussion 245.1 ASIC vs FPGA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

5.1.1 Benefits of an ASIC Based Circuit . . . . . . . . . . . . . . . . . . . . . . . 245.1.2 Benefits of an FPGA Based Circuit . . . . . . . . . . . . . . . . . . . . . . 25

5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255.2.1 Pipelined Direct Form vs Transposed Direct form . . . . . . . . . . . . . 255.2.2 Resource Utilization in Direct Form DSP48E vs. DSP48E1 and DSP48E2 255.2.3 Input Sharing in FIR Filter with ASIC Perspective . . . . . . . . . . . . . 26

5.3 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265.4 Ethics of Manufacturing Companies . . . . . . . . . . . . . . . . . . . . . . . . . 27

5.4.1 Assessment Schedule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275.4.2 Precautionary Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

5.5 Source Criticism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

6 Conclusion 296.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

Bibliography 30

vii

List of Figures

2.1 An overview of a typical FPGA architecture. The red boxes indicate a numberof CLB blocks distributed over an FPGA. The green boxes indicate input/outputblocks of an FPGA. The blue boxes indicate DSP blocks where the focus of thiswork lies. The vertically and horizontally distributed lines indicate interconnec-tions between the blocks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 Simplified internal structure of a typical CLE. The purpose of a LUT is to imple-ment mathematical operations such as addition. The LUT box and its three-inputsis described in figure 2.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.3 Simplified internal structure of a LUT. The three-inputs of a LUT control the selec-tion of the 1/0 input bit of the multiplexers. . . . . . . . . . . . . . . . . . . . . . . . 6

2.4 Simplified network of interconnections. SB refers to Switch Boxes and CB refers toConnection Boxes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.5 An example of a signal flow graph as an alternative representation of the differ-ence equation presented above. The ak indicate FIR filter coefficients and x(n ´ k)indicate delayed versions of input signal x[n]. . . . . . . . . . . . . . . . . . . . . . . 8

2.6 Direct form structure of a FIR filter. x(n) and y(n) are input and output sequencerespectively. h[0] . . . h[N ´ 1] represent the coefficients. . . . . . . . . . . . . . . . . 9

2.7 Transposed direct form structure of the FIR filter where input signals are fed intomultipliers directly. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.8 Impulse response of 9-tap linear phase FIR filter. . . . . . . . . . . . . . . . . . . . . 102.9 Symmetric FIR filter structure in direct form. Each multiplier represent two coef-

ficients and therefore the total usage of multipliers are decreased to half. . . . . . . 102.10 Symmetric FIR filter structure in transposed direct form. Each multiplier represent

two coefficients and therefore the total usage of multipliers are decreased to half.The dashdotted box indicates one Look-Up table. . . . . . . . . . . . . . . . . . . . 10

2.11 Input Sharing of two FIR filters h and g in direct form. Input delay element isshared between the filters h and g to save resource usage. . . . . . . . . . . . . . . . 11

2.12 The concept of polyphase interpolator. The input x[n] feds into subfilters and sentto output y[n]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.13 Symmetric coefficients of two FIR filters in transposed direct form. . . . . . . . . . 12

4.1 Simplified schematic of a DSP48E slice. ACIN, BCIN and PCIN are the onlyinputs to the slice while ACOUT and PCOUT are the only outputs. The dashedbox indicates a closed DSP slice block where it is prohibited to reach any desiredsignal inside this block. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4.2 Simplified schematic of a DSP48E1 slice. ACIN, BCIN, PCIN and D are the onlyinputs to the slice while ACOUT and PCOUT are the only outputs. The dashedbox indicates a closed DSP slice block where it is prohibited to reach any desiredsignal inside this block. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

viii

4.3 Simplified schematic of a DSP48E2 slice. ACIN, BCIN, PCIN, D and F are theonly inputs to the slice while ACOUT and PCOUT are the only outputs. Thedashed box indicates a closed DSP slice block where it is prohibited to reach anydesired signal inside this block. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.4 Non-symmetric direct form FIR filter structure in DSP48E, DSP48E1 and DSP48E2where the area marked in dashed box indicates one DSP slice. . . . . . . . . . . . . 17

4.5 Symmetric pipelined direct form FIR filter structure in DSP48E where the dashedbox indicates one DSP slice and the dashdotted box indicates one LUT. . . . . . . . 18

4.6 Symmetric pipelined direct form FIR filter structure in DSP48E1 where the dashedbox indicates one DSP slice and the dashdotted box indicates one LUT. . . . . . . . 18

4.7 Transposed direct form FIR filter structure in DSP48E, DSP48E1 and DSP48E2. . . 194.8 Transposed symmetric form FIR filter structure in DSP48E, DSP48E1 and DSP48E2

where a dash box indicates one DSP slice and a dashdotted box indicates one LUT. 194.9 Input sharing of two FIR filters in pipelined direct form. No resource reduction is

obtained because the resources are not shared between the two filters. . . . . . . . 204.10 Input sharing of two symmetric direct FIR filter in DSP48E. Input delay buffer and

pre-adder are shared between the filters g and h. . . . . . . . . . . . . . . . . . . . . 214.11 Input sharing of two symmetric direct FIR filters in DSP48E1 and DSP48E2. Input

buffer delay is shared between the filters g and h. . . . . . . . . . . . . . . . . . . . 224.12 Input sharing of two FIR filters in transposed direct form. No resource reduction

is obtained because the resources are not shared between the filters g and h. . . . . 224.13 Symmetric pairs technique of 12 coefficients interpolated by 3 polyphase subfilters. 23

5.1 Mapping of two pipelined direct form FIR filter in ASIC. . . . . . . . . . . . . . . . 26

ix

List of Tables

4.1 Utilized resources of single N-tap FIR filter in DSP48E, DSP48E1 and DSP48E2. . . 204.2 Utilized resources of K parallel N-tap FIR filters in DSP48E, DSP48E1 and DSP48E2. 23

x

1 Introduction

Digital signal processing has gained much popularity in the digital world in the recentdecades due to its ability to transform digital and/or analog signals in an efficient way byreducing the cost associated with the design and implementation. Therefore it became acompetitor to analog signal processing by replacing a big amount of its applications. Digitalsignal processing can be used in various applications, some of which are speed, image andaudio processing. Although DSPs could be implemented in digital signal processors,themilestone in digital signal processing implementation came in the 1980’s with the introduc-tion of the field programmable gate arrays (FPGA) [8].

Finite-length impulse response (FIR) filters are one of the most commonly used digital signalprocessing algorithms. FPGA is the hardware used nowadays to implement FIR filter be-cause of its high-speed, low cost and low power dissipation. There are many different FPGAmanufacturers with different hardware architectures.

1.1 Background

In 1985, the first commercial FPGA device XC2064 was launched by Xilinx. Xilinx is a well-known company in FPGAs industry. The XC2064 device had two three-input lookup tables(LUT) and minimal amount of logic gates.

Over time, a number of features have been added to perform different tasks. The continuousincrease in customer demands led to developing the FPGA where dedicated blocks, such asDSP blocks have been added to perform specific functions including FIR filter.

1.2 Motivation and Purpose

Nowadays, modern FPGAs have millions of logic gates and has become much more com-plex than its first launch in 1985. The continued development of the FPGA device throughthe insertion of dedicated blocks raised the need to study the advantages offered by differ-ent FPGA families and then determine the consequences of using a certain FPGA instead ofanother one. Therefore, the purpose of this work is to in-depth study the features offered by

1

1.3. Research Questions

different FPGA families that are specially designed for implementing FIR filters and therebyintroduce a cost model in form of resource utilization to help the designer to choose an ap-propriate design when implementing the FIR filter after taking into consideration differentscenarios that may occur.

1.3 Research Questions

The study of this work can be formulated in the following questions:

1. What features do different FPGA families have aimed at FIR filters?

2. How much of resources are needed in implementing FIR filter to meet certain scenarios?Example of scenarios include:

• Filter order.

• Different stages of pipelining.

• Impulse response symmetry.

These questions are the guidance through the whole work in order to keep the limitationsvalid and to choose an appropriate method.

1.4 Delimitations

The work limits itself to study the DSP slices in different FPGA families that can be used toimplement the FIR filter. The studied DSP slices are Xilinx DSP48E, DSP48E1 and DSP48E2.The focus of this work is to introduce a cost model based on resource utilization.

The time for this work is ten weeks long, therefore only the Xilinx DSP slices mentioned aboveare going to be studied. Note that there is no implementation of FIR filter presented in thiswork.

1.5 Related Research

There are several popular implementation methods used in today’s industry to map FIRfilter on FPGAs. Implementation methods discussed in [7] are: (1) traditional multiplicationand addition design, (2) distributed arithmetic with relative multiplierless design, (3) addand shift, i.e. convert multipliers into adders, subtractors and shift operators and eliminatethese operators afterwards and (4) implement a soft core by CoregenTM from Xilinx. Theresult implemented on Xilinx Spatan 3 device showed that the total occupied slices in add-and shift method were much less compared with the other three methods.

A study made by Kadam et al. [6] where an analysis of various FIR filter structures madeto implement an efficient FIR filter on FPGA. Fast FIR architecture, pipelined and parallelprocessing architectures was implemented and analyzed. Pipelined structures in form ofinserting delay elements after the adders will not increase the area. Parallel processing leadsto more use of resources due to increased sample rate. To implement parallel structure forfilter order 3, 3N multiplications and 3(N ´ 1) additions are needed. Fast FIR structuresreduce the resources by using 2N multiplications and 2N + 4 additions compared to theparallel structure. The study concludes that the pipelined structure is the best alternativewhen it comes to resource reduction.

An another study was made by Selvakumar et al. [9] was made to present an efficient parallelFIR filter structure in symmetric convolutions. Since the multiplier consumes more hardware

2

1.5. Related Research

than the adder, the multipliers has been replaced by additional adders in both pre-processingand post-processing adder block in FIR filter allowing the number of increased adders toremain the same when the length of the FIR filter becomes large.

It should be mentioned that the work in this thesis differentiate itself from the researchespresented above by studying the features in different DSP slices rather than presenting im-plementation methods.

3

2 Theory

This theory part is divided into three sections. The first section is going to describe the ma-jor building blocks in modern FPGAs, what these blocks consist of and what they are used for.

The second section is going to present the mathematical equation for a digital FIR filter andthe representation of the equation in a signal flow graph form in order to make different FIRfilter structures.

The third section is going to describe a number of techniques in order to optimize the FIRfilter complexity.

2.1 Field-Programmable Gate Array (FPGA)

An FPGA is an integrated circuit that can be programmed after manufacturing to functionas any digital circuit determined by the designer. The key difference between FPGA andApplication Specific Integrated Circuit (ASIC) is that the ASIC can only be used for a cer-tain application. The major building blocks in modern FPGAs consist of Configurable LogicBlocks (CLBs), Interconnections, Input/Output Blocks (IOBs) and embedded blocks such asDSP blocks. An overview of the architecture of modern FPGAs is shown in figure 2.1 [3].

4

2.1. Field-Programmable Gate Array (FPGA)

CLBD S P

I/O

I/O I/OI/O

I/O

I/O

I/O

I/O I/O I/O

I/O

I/O

CLB CLB CLB

CLBCLB

CLBCLBCLBD S P

D S P

Figure 2.1: An overview of a typical FPGA architecture. The red boxes indicate a number ofCLB blocks distributed over an FPGA. The green boxes indicate input/output blocks of anFPGA. The blue boxes indicate DSP blocks where the focus of this work lies. The verticallyand horizontally distributed lines indicate interconnections between the blocks.

2.1.1 Configurable Logic Block (CLB)

The CLB is used to implement logic functions and mathematical operations. Inside eachCLB, are so called Configurable Logic Elements (CLEs) which in turn consists of Look-UpTable(LUTs), D-flip flops and a 2-to-1 multiplexers, as illustrated in figure 2.2 [3].

LUT D-flip flop

ABC

Figure 2.2: Simplified internal structure of a typical CLE. The purpose of a LUT is to im-plement mathematical operations such as addition. The LUT box and its three-inputs is de-scribed in figure 2.3

The core of the FPGA consists of thousands of CLE copies connected with each other. TheLUT is made of a series of cascaded multiplexers where the LUT inputs are used as the selectlines and the inputs to multiplexers is a 1-bit SRAM memory that is set to either 0 or 1. Theinternal structure of a LUT is shown in figure 2.3 [3].

5


1 0

1 0

1 0

1 0

1 0

1/01/0

1/01/0

1/01/0

1/01/0

1 0

ABC

1 0

Figure 2.3: Simplified internal structure of a LUT. The three-inputs of a LUT control the selec-tion of the 1/0 input bit of the multiplexers.

2.1.2 Interconnections

As illustrated in figure 2.1, a big part of the FPGA’s architecture is utilized by vertically andhorizontally distributed interconnections. These interconnections consist of switch boxesand connection boxes in order to get the desired connection [3]. Each CLB is surrounded bya group of connection boxes that in turn are connected to each other via a group of switchboxes, as shown in figure 2.4. These connection boxes are used to connect different CLBs toeach other and furthermore provide connection to both I/O blocks and embedded blocks [3].

CLB

CBSB SB

CB

SB CB SB

CB

Figure 2.4: Simplified network of interconnections. SB refers to Switch Boxes and CB refersto Connection Boxes.

6


The placement of interconnections plays an important role in determining the flexibility andefficiency of the FPGA. Therefore, manufacturers nowadays offer flexible construction of in-terconnections including sets of both short and long wires in order to efficiently perform anydigital circuit.

2.1.3 Input/Output Block (IOB)

Input/Output blocks are placed all around FPGA’s edges, as shown in figure 2.1 where theIOBs function as a communication interface between the internal signals and the externalpins.

2.1.4 DSP Block

Although a Configurable Logic Block is capable of performing arithmetic operations andstoring data, it would be too slow and utilizes a huge amount of resources because a CLBis designed to be relatively general in order to perform different types of functions. As thedemand for the FPGA is increasing, modern FPGAs offer embedded blocks to perform spe-cific functions, some of which are memory blocks and DSP blocks. The introduction of theseembedded blocks clearly reduce area utilization, power consumption and as a consequenceresults in better performance [2].

Although the internal structure of DSP blocks and its related details are different dependingupon manufacturer and device version, the overall architectures of most DSP blocks lookalike.

A DSP block consists of an adder and multiplier followed by an accumulator. The aim ofintroducing DSP blocks are to enhance performance of these arithmetic operations. Also,there are specific connections in each DSP block that can be used to connect multiple DSPblocks to each other which in turn can be used to implement an efficient FIR filter [2].

7

2.2. Digital Filter

2.2 Digital Filter

A digital filter is used to modify signal characteristics in the time and/or frequency domain.A digital signal is obtained by sampling the continuous signal in different time frames afterwhich the signal is represented as a sequence of discrete values, unlike the analog signalwhich is continuous and represented as a function of time [10].

Digital FIR filter algorithms can be described mathematically by the difference equation

y[n] =N

ÿ

k´1

ak ¨ x(n ´ k) or by signal flow graphs that are represented as block diagrams [10].

T

x[n]

x[nk]

y[n]ak ak

Figure 2.5: An example of a signal flow graph as an alternative representation of the differ-ence equation presented above. The ak indicate FIR filter coefficients and x(n ´ k) indicatedelayed versions of input signal x[n].

The fundamental components used in digital filters, as shown in figure 2.5, are multipliers,adders and delay elements. While difference equations are used to describe the relationbetween the input and output sequence and are therefore an external behavioral of the filter,signal flow graphs are more detailed and are used to exactly compute the output values y[n]and the internal values in the filter [10].

A Linear time invariant (LTI) filter is one of the most commonly used digital filters. A LTIfilter can be described by its impulse response h[n]. Impulse response h[n] is the outputobtained by applying an impulse as an input sequence. The output y[n] is obtained by theprocess of convolution that convolve the impulse response and input sequence in the timedomain and can mathematically be presented as y[n] = x[n] ¨ h[n] .

2.2.1 Finite-length Impulse Response (FIR) Filter

LTI filters are usually divided into two types which are finite-length impulse response (FIR)and infinite-length impulse response (IIR). Finite-length impulse response means that theimpulse response becomes zero after a finite number of samples. In contrast, the infinite-length impulse response has an infinite number of sample values [10]. The work presentedin this thesis covers the FIR filter only.

2.2.1.1 FIR Filter Structures

There are different ways to construct FIR filter. Two common ones are classified as 1) directform structure and 2) transposed direct form structure.The direct form structure is the classical implementation of the FIR filter equation shown insection 2.2. The input samples are delayed by using delay elements in form of D-flip flops.The direct form structure depends upon so called chain adder where input data is partiallycomputed and outputted through a chain of small adders. The chain adder technique leadsto significant reduce in area utilization. The critical path of the FIR filter is the longest path

8

2.3. Filter Complexity Optimization

of logic between two registers. Latency is the number of inserted pipelined registers betweeninput and output. Figure 2.6 shows the direct form structure of a FIR filter [4].

x[n] T

h[0] h[1]

T T

h[2] h[N 1]

y[n]

Figure 2.6: Direct form structure of a FIR filter. x(n) and y(n) are input and output sequencerespectively. h[0] . . . h[N ´ 1] represent the coefficients.

The transposed direct form is obtained from the direct form after applying the transpositiontheorem. Transposition is obtained by replacing inputs with outputs and replacing outputswith inputs where the inputs are directly connected to multipliers [10]. The critical path oftransposed structure reduced significantly in comparison with direct form.

T T

h[0]

y[n]T

h[N 1]

x[n]

h[N 2] h[N 3]

Figure 2.7: Transposed direct form structure of the FIR filter where input signals are fed intomultipliers directly.

2.3 Filter Complexity Optimization

There are several techniques that are practically used for FIR filter to improve filter imple-mentation by reducing area usage, time and power dissipation.

2.3.1 FIR Filter Symmetry

A FIR filter has the ability to obtain linear phase which in turn implies that the impulseresponse h(n) presents symmetry where h(n) = h(N ´ n) where N is the filter order andx[n] are the input samples. Figure 2.8 shows the impulse response of 9 coefficients FIR filterwhere figure 2.9 and figure 2.10 show the representation of an odd number FIR filter in signalflow graph in direct form and transposed direct form respectively.

The math behind resource reduction when using symmetry follows the distributive propertyof multiplication as illustrated in the following example: (x ¨ y) + (x ¨ z) = x(y + z) while theleft hand side of the equation performs two multiplications and one addition, the right handside performs only one multiplication and one addition [1].

9


h[0]h[1]

h[2]

h[3]

h[4]

h[5]

h[6]h[7]

h[8]

Figure 2.8: Impulse response of 9-tap linear phase FIR filter.

The implementation of impulse response symmetry in figure 2.8 can be shown in figure 2.9where the input delay chain is fed forward and backward to save resources. Two symmetriccoefficients are now represented with a single multiplier.

T T

h[(N 1)/2]h[0]

x[n]

h[1]

T

TTT

y[n]

Figure 2.9: Symmetric FIR filter structure in direct form. Each multiplier represent two coef-ficients and therefore the total usage of multipliers are decreased to half.

The impulse response symmetry can also be implemented in transposed direct form wherethe total usage of multipliers are decreased to half, as shown in figure 2.10.

T y[n]T

h[N 1]

x[n]

T

h[N 2]

Figure 2.10: Symmetric FIR filter structure in transposed direct form. Each multiplier repre-sent two coefficients and therefore the total usage of multipliers are decreased to half. Thedashdotted box indicates one Look-Up table.

10


2.3.2 Input sharing of Multiple FIR filters

The input sharing of multiple FIR filters are beneficial when dealing with large numbers offilters because of its ability to save area. The input sharing of the direct form FIR filter can beshown in figure 2.11 where the delay element is shared between two filters.

x[n] T T T

y[n]

g[0]

h[0]

g[1]

h[1]

g[2]

h[2]

g[N 1]

h[N 1]

y[n]

Figure 2.11: Input Sharing of two FIR filters h and g in direct form. Input delay element isshared between the filters h and g to save resource usage.

The input sharing can also be used in transposed direct form, however, it would not be bene-ficial in saving area when using non-symmetric coefficients because of no resources are goingto be shared.

2.3.3 Interpolation

The interpolation process is a multi-rate filter technique used to increase sample rate orsample frequency in order to obtain a certain function.

The reduction in memory and computation time is obtained by so called polyphase interpo-lator where the interpolation process is divided into multiple stages. The input signal x[n]is fed through the polyphase interpolator which consist of subfilters placed in parallel. Af-terwards, each subfilter outputs the corresponding input sample and send it to output y[n]through a commutator, as shown in figure 2.12 [4].

h0[n]

h1[n]

h2[n]

hp1[n]

x[n] y[n]

Figure 2.12: The concept of polyphase interpolator. The input x[n] feds into subfilters andsent to output y[n].

11


2.3.3.1 Exploiting Symmetry in Polyphase Interpolator FIR Filter

As described in subsection 2.3.3, a polyphase interpolator divides the symmetric coefficientsof interpolation filter into subfilters. The subfilters can be symmetrical or non-symmetrical.When the coefficients of the subfilters are symmetrical (as for linear phase), the transposeddirect form structure can be obtained as shown in figure 2.13.

T

h[0]

X[n]

h[1]

T y[n]

y[n]

Figure 2.13: Symmetric coefficients of two FIR filters in transposed direct form.

However, the coefficients of subfilters are not necessarily symmetric in themselves. There-fore, a technique called symmetric pairs is performed to obtain symmetry when coefficientsof subfilters are not symmetric [4]. The number of subfilters is determined by the factor L.

For simplicity, the following example demonstrates the symmetric pairs technique for a 12coefficients interpolated filter by factor 3. The filter coefficients in the example below can beseen as impulse response in figure 2.8 but with 12 coefficients instead of 9 coefficients.

The filter coefficients 1, 2, 3, 4, 5, 6, 6, 5, 4, 3, 2, 1 produce the following subfilters:H1 = 1 , 4 , 6 , 3H2 = 2 , 5 , 5 , 2H3 = 3 , 6 , 4 , 1

The example above shows that the subfilter H2 is symmetric while the other two filters, H1and H3 are non-symmetric. By using the symmetric pairs technique, new subfilters producedwhere:H1 = 1 + 3 , 4 + 6 , 6 + 4 , 3 + 1 = 4 , 10 , 10 , 4H2 = 2, 5, 5, 2H3 = 3 ´ 1 , 6 ´ 4 , 4 ´ 6 , 1 ´ 3 = 2 , 2 , ´2 , ´2

Symmetry are now obtained in H1 and H3, although H3 has a negative symmetry, and thearea reduction is satisfied according to symmetry features discussed earlier.

12

3 Method

In order to study how FIR filter is mapped on Xilinx FPGAs, the process is divided intoseveral stages.

The first stage is to determine the DSP slice versions that will be studied and they are DSP48Efor FPGA Virtex-5, DSP48E1 for 6- and 7-series FPGAs and DSP48E2 for Ultrasclae FPGA.Afterwards, the data sheets of corresponding DSP slice version are studied and an overvieware taken of each DSP slice version.

The second stage is to focus on the unique features in each DSP version that are specificallydesigned for the implementation of the FIR filter. The focus is to determine the features of acertain DSP slice which is not found in the other DSP versions. Afterwards, different types offilter structures are studied were each filter structure is mapped in a different way. The focusis on presenting a cost model in form of utilized DSP slices and then determine the latency ofthe corresponding structure. The utilized DSP slices and clock cycles latency are calculatedseparately.

The third stage is the simulation of various types of FIR filter structures on the coefficientsgenerator and then comparing the results obtained from the generator with those found incorresponding data sheets. The generator used for coefficients generating is Xilinx COREGenerator version 5.0 for the Virtex-5 DSP48E and version 6.3 for the 7-series DSP48E1 andUltraSclae DSP48E2.

The Xilinx CORE Generator software is used for generating coefficients of just single FIR filterbecause the input-share feature is not supported by the software. In addition, the symmetryof transposed structure is not supported either.

13

4 Results

The results presented in this chapter are divided as follows:The features of the three DSP slice types are presented. Afterwards, different FIR filter struc-tures and their resource utilization are presented for single FIR filter. The last section inthis chapter presents the different FIR filter structures and the associated resource utilizationwhen multiple FIR filters share the input signal.

4.1 Features of The DSP Slice Types

Xilinx first introduction of the DSP48 slice architecture is though to deliver programmabledevices with more flexibility and higher capability of performing logic operations and im-plementing digital signal processing applications. The three DSP slice types discussed beloware considered an improvement of the DSP48 slice version while the features of the presentedDSP slice types have a lot of similarities but also important differences. The main differencesare discussed below.

4.1.1 Virtex-5 FPGA DSP48E Slice

The DSP48E slice is an extension of the DSP48 slice as mentioned above. The DSP48E comeswith all the features introduced in the DSP48 and even more. One of the important featuresincluded in DSP48E is the 25 ˆ 18 two’s-complement multiplier instead of 18 ˆ 18 two’s-complement multiplier offered in DSP48 [12]. The DSP48E has no pre-adder.

14

4.1. Features of The DSP Slice Types

T

T

T

T T

BCIN

PCIN PCOUT

ACIN ACOUT

Figure 4.1: Simplified schematic of a DSP48E slice. ACIN, BCIN and PCIN are the onlyinputs to the slice while ACOUT and PCOUT are the only outputs. The dashed box indicatesa closed DSP slice block where it is prohibited to reach any desired signal inside this block.

The add and subtract feature in DSP48 is extended to even handle logic operations in DSP48E.The DSP48E has a cascade path ACIN, BCIN and PCIN in form of input paths and ACOUTand PCOUT as an output paths. The input and output paths are used to cascade multipleDSP slices with each other and resulting in so-called cascaded chain [12].It should be mentioned that the DSP48E include more improvements, but they are less im-portant in this work.

4.1.2 The 7-Series FPGA DSP48E1 Slice

The Xilinx 7-series FPGA DSP48E1 slice fundamental resources consist of 25 ˆ 18 two’s- com-plement multiplier, 48-bit accumulator and a 25-bit pre-adder. The pre-adder contribute inreducing the number of utilized resources and thereby leads to low power consumption whensymmetrical coefficients are used [13]. The pre-adder has a 25-bit input vector D. The pur-pose of the input vector D is to increase capacity of the ACIN path. The presence of input,output and intermediate pipelined registers also serve in increasing throughput. Figure 4.2shows an overview of a DSP48E1 slice architecture.

T

T

TBCIN

PCIN PCOUT

T TACIN ACOUT

T

D

Figure 4.2: Simplified schematic of a DSP48E1 slice. ACIN, BCIN, PCIN and D are the onlyinputs to the slice while ACOUT and PCOUT are the only outputs. The dashed box indicatesa closed DSP slice block where it is prohibited to reach any desired signal inside this block.

15

4.1. Features of The DSP Slice Types

4.1.3 UltraScale FPGA DSP48E2 Slice

The DSP48E2 slice is an improved version of the DSP48E1 discussed in subsection 4.1.2. Themain improvements include optimizing the 25 ˆ 18 two’s complement multiplier to a 27 ˆ 18two’s complement multiplier, 27-bit pre-adder instead of 25-bit pre-adder in DSP48E1. Thepre-adder has a 27-bit input vector D. The pre-adder has more flexibility by selecting inputsfrom the input paths ACIN or BCIN. The adder capability is increased by introducing a thirdinput operand F instead of two-input adder in DSP48E1 [14]. Figure 4.3 shows an overviewof a DSP48E2 slice architecture.

T

T

TBCIN

PCIN PCOUT

T TACIN ACOUT

T

D

F

Figure 4.3: Simplified schematic of a DSP48E2 slice. ACIN, BCIN, PCIN, D and F are theonly inputs to the slice while ACOUT and PCOUT are the only outputs. The dashed boxindicates a closed DSP slice block where it is prohibited to reach any desired signal insidethis block.

16

4.2. FIR Filter Structures and Resource Utilization

4.2 FIR Filter Structures and Resource Utilization

The resource utilization is dependent upon the FIR filter structure, as discussed below.

4.2.1 Pipelined Direct Form FIR Filter

The pipelined direct form FIR filter structure in DSP48E, DSP48E1 and DSP48E2 looks alikewith respect to area utilization. However, they have different latency.

One of the most important characteristics of DSP48E slice, as well as DSP48E1 and DSP48E2is its ability to cascade multiple DSP slices to shape certain functions such as a FIR filter.The cascade function is performed by connecting multiple adders in series. The cascade-addition is applied by connecting the input and the output of one DSP slice to the neighbourDSP slice resulting in higher performance and reduction in utilized resources because thelogic outside DSP slice blocks is not used. The minimal use of the FPGA-logic gives betterefficiency and less resource utilization [4]. Figure 4.4 shows the structure of the direct formFIR filter structure for the slice types DSP48E, DSP48E1 and DSP48E2.

Y[n]

X[n]

T T

TT Th[0]

h[N 1]

T

h[1]

T T

Th[2]

T

T T

T

Figure 4.4: Non-symmetric direct form FIR filter structure in DSP48E, DSP48E1 and DSP48E2where the area marked in dashed box indicates one DSP slice.

The direct form FIR filter structure in DSP48E, DSP48E1 and DSP48E2 utilize N DSPslice/block for N non-symmetric coefficients.

The initial latency for the slice DSP48E is 5 + N clock cycles[12], while the initial latencyfor the slice DSP48E1 is 6 + N [13] and an initial latency of 6 + N clock cycles for the sliceDSP48E2 [14].

4.2.2 Symmetric Pipelined Direct Form FIR Filter

The direct form FIR filter gives the opportunity to use coefficient symmetry to minimize therequired DSP slices without affecting the functionality and speed. By taking advantage ofthe pre-adder, the symmetric pipelined direct form FIR filter requires N/2 DSP slice for Ncoefficients. However, symmetric pipelined direct form FIR filter makes use of logic outsideDSP blocks, in the FPGA.The utilized logic in DSP48E depends upon input bit width W and coefficients N where theutilized DSP slices is ((W + 1) ¨ (N/2)) + (W/2). The initial latency is 6+ N clock cycles [11].Figure 4.5 shows the symmetric pipelined direct form FIR filter structure in DSP48E.

17

4.2. FIR Filter Structures and Resource Utilization

T T T

T

T

T

T

Th[1]

T

T

T

T

T

Th[(N 1)/2]

T

T

T

T

Th[0]

y[n]

x[n]

Figure 4.5: Symmetric pipelined direct form FIR filter structure in DSP48E where the dashedbox indicates one DSP slice and the dashdotted box indicates one LUT.

The resource utilization in DSP48E1 is the same as DSP48E with respect to the number ofutilized DSP slices, i.e. N/2 DSP slice. However, the Look-UP table logic utilization is wayless than DSP48E, as illustrated in figure 4.6. The utilized logic in DSP48E1 is [N/16] LUTsand the initial latency is 7 + N where one DSP slice takes 8 clock cycles, two DSP slices take9 clock cycles, and so on [13].

T

T

T

Th[0]

y[n]

Tx[n] TT

T T T T T T T T

T

T

T

T

T T

h[1]

TT T

T

T

T

T

Th[(N 1)/2]

Figure 4.6: Symmetric pipelined direct form FIR filter structure in DSP48E1 where the dashedbox indicates one DSP slice and the dashdotted box indicates one LUT.

The resource utilization in DSP48E2 is the same as DSP48E1 with respect to the number ofutilized DSP slices and logic, i.e. N/2 DSP slice +[N/16] LUTs.

18

4.3. Generated Results

4.2.3 Transposed Direct FIR Filter

The transposed direct FIR filter utilize N DSP slices for N non-symmetric coefficients forDSP48E, DSP48E1 and DSP48E2. However, the three DSP slice types have different latency.The DSP48E and DSP48E2 have a constant latency of 6 clock cycles whereas the transposeddirect form FIR filter structure in DSP48E1 has a constant latency of 8 clock cycles. Figure 4.7shows the FIR filter structure of a transposed direct form in DSP48E, DSP48E1 and DSP48E2.

T

T

T

T

T

T

T

T

h[0]

Y[n]

T

T

T

T0

h[N 1]

X[n]

h[N 2]

Figure 4.7: Transposed direct form FIR filter structure in DSP48E, DSP48E1 and DSP48E2.

4.2.4 Transposed Symmetric FIR Filter

The transposed symmetric FIR filter utilize N/2 slices for N non-symmetric coefficients forDSP48E, DSP48E1 and DSP48E2. However, they have different latency. The DSP48E andDSP48E2 have a constant latency of 6 clock cycles where the transposed symmetric form FIRfilter structure in DSP48E1 has a constant latency of 8 clock cycles. Figure 4.8 shows the FIRfilter structure of a transposed direct form in DSP48E, DSP48E1 and DSP48E2.

T y[n]T

h[N 1]

x[n]

T

h[N 2]

Figure 4.8: Transposed symmetric form FIR filter structure in DSP48E, DSP48E1 and DSP48E2where a dash box indicates one DSP slice and a dashdotted box indicates one LUT.

4.3 Generated Results

The following table shows the generated results of the utilized resources of single FIR fil-ter in DSP48E, DSP48E1 and DSP48E2. The generation of coefficients is run by using CoreGenerator software tool by Xilinx.

19

4.4. Input Sharing of Multiple FIR Filters

Filter Structure DSP Slice Type DSP Slices Latency LUTsNon-symmetric direct DSP48E N 5 + N NoneNon-symmetric direct DS48E1 N 6 + N NoneNon-symmetric direct DS48E2 N 6 + N NoneSymmetric direct DSP48E N/2 6 + N (W + 1) ˆ (N/2) + (W/2)Symmetric direct DSP48E1 N/2 7 + N N/16Symmetric direct DSP48E2 N/2 6 + N N/16Transposed direct DSP48E N 6 NoneTransposed direct DSP48E1 N 8 NoneTransposed direct DSP48E2 N 6 NoneSymmetric transposed DSP48E N/2 6 W ˆ NSymmetric transposed DSP48E1 N/2 7 NSymmetric transposed DSP48E2 N/2 6 N

Table 4.1: Simulated results of utilized resources of single N-tap FIR filter in DSP48E,DSP48E1 and DSP48E2.

4.4 Input Sharing of Multiple FIR Filters

The results presented in the table above present the resource utilization of the three DSP slicetypes DSP48E, DSP48E1 and DSP48E2 with respect to a single FIR filter. The results presentedin the following sections belong to the input sharing of multiple FIR filters.

4.4.1 Input Sharing of Pipelined Direct Form FIR Filter

Input sharing of K FIR filter in DSP48E, DSP48E1 and DSP48E2 gives K ˆ N DSP slices for Nnon-symmetric coefficients. Figure 4.9 illustrates input sharing of two FIR filters in pipelineddirect form. The initial latency is the same for respective type of DSP slice as discussed in thesingle FIR filter in subsection 4.2.1.

X[n]

T

T Tg[0]

T

T

T T

T T T T

T T

h[0]

T

T

T T

T

h[1]

g[1] g[2]

Y[n]T

T

T

h[N 1]

T

TT

Y[n]TT

h[N 1]h[2]

Figure 4.9: Input sharing of two FIR filters in pipelined direct form. No resource reduction isobtained because the resources are not shared between the two filters.

20


4.4.2 Input Sharing of Symmetric Pipelined Direct Form FIR Filter

Unlike FIR filter structures in non-symmetric direct and transposed form, the symmetricpipelined direct form FIR filter structure utilizes LUTs logic which in turn can be used ininput sharing.In DSP48E symmetric FIR filter, LUT logic usage is preferred more than a DSP slice. The datais pre-added before the multiplication with the coefficients. The introduction of chain inputdelay buffer saves logic by using an SRL16E shift register [11]. The adder chain is reducedby half due to symmetry which in turn leads to reduction in latency. Figure 4.10 shows thestructure of two symmetric pipelined direct form FIR filter in DSP48E.

T T T

T

T

T

T

Th[1]

T

T

T

T

T

Th[(N 1)/2]

T

h[0]

y[n]

x[n]

T

T

T

T g[1]

T

T

T

T

g[0]

T

T

T

T

y[n]

g[(N 1)/2]

T

T

T

T

Figure 4.10: Input sharing of two symmetric direct FIR filter in DSP48E. Input delay bufferand pre-adder are shared between the filters g and h.

In DSP48E1 and DSP48E2, the initial latency for respective DSP types still the same as men-tioned in the single FIR filter. However, the input sharing of multiple filters give logic utiliza-tion of [N/16] LUTs. The figure 4.11 illustrate input sharing of 2 symmetric direct FIR filtersin DSP48E1 as well as DSP48E2.

21


T

T

T

Tg[0]

y[n]

Tx[n] TT

T T T T T T T T

T

T

T

T

T T

g[1]

TT T

T

T

T

T

Tg[(N 1)/2]

T T

T

T

T

T

Th[0]

T T

T

T

T

T

T

h[1]

T T

T

T

T

T

T

h[(N 1)/2]

y[n]

Figure 4.11: Input sharing of two symmetric direct FIR filters in DSP48E1 and DSP48E2. Inputbuffer delay is shared between the filters g and h.

4.4.3 Input Sharing of Transposed FIR filter

Input sharing of K FIR filter in DSP48E, DSP48E1 and DSP48E2 gives K ˆ N DSP slices forN non-symmetric coefficients as illustrated in figure 4.12. The initial latency is the same forrespective type of DSP slice as discussed in the single FIR filter.

X[n]

T TT

T Th[0]Th[N 1]

T Tg[0]Tg[N 1]

T TT

T TT

T T Y[n]T0

g[N 2]

T TT

h[N 2]

T T Y[n]T0

Figure 4.12: Input sharing of two FIR filters in transposed direct form. No resource reductionis obtained because the resources are not shared between the filters g and h.

22

4.5. Generated Results

4.4.4 Exploiting Symmetry in Polyphase Interpolator FIR Filter

The symmetric pairs technique is obtained by adding and subtracting the output of subfiltersH1 and H3, as discussed in section 2.3.3 . Afterwards, the subfilters H1 and H3 are scaled bya factor of a hal f to obtain the output of the primary filter, as illustrated in figure 4.13.

x[n]

y[3n]h1[n]

h2[n]

h3[n]

1/2

1/2

y[3n+1]

y[3n+2]

Figure 4.13: Symmetric pairs technique of 12 coefficients interpolated by 3 polyphase subfil-ters.

The symmetric pairs technique can be applied when using an even number of coefficientsinterpolated by 2, for example: H = 1, 2, 3, 3, 2, 1 introducing subfilters H1 = 1, 3, 2 and H2 =2, 3, 1. The subfilters are not symmetric in themselves, therefore symmetric pairs techniqueis needed to obtain symmetry. However this technique is not required when the numberof coefficients is odd and interpolated by 2, for example: H = 1, 2, 3, 4, 3, 2, 1 introducingsubfilters H1 = 1, 3, 3, 1 and H2 = 2, 4, 2 do not need symmetric pairs technique becausefilter symmetry is already satisfied. By using the symmetric pairs technique, N coefficientstake N/2 DSP slices, as in normal symmetry, and N DSP slices otherwise.

4.5 Generated Results

The following table shows the generated results of the utilized resources of K parallel N-tapFIR filter in DSP48E, DSP48E1 and DSP48E2. The generation of coefficients is run by usingCore Generator software tool by Xilinx.

Filter Structure DSP Slice Type DSP Slice Latency LUTsNon-symmetric direct DSP48E K ˆ N 5 + N NoneNon-symmetric direct DS48E1 K ˆ N 6 + N NoneNon-symmetric direct DS48E2 K ˆ N 6 + N NoneSymmetric direct DSP48E K ˆ (N/2) 6 + N (W + 1) ˆ (N/2) + (W/2)Symmetric direct DSP48E1 K ˆ (N/2) 7 + N N/16Symmetric direct DSP48E2 K ˆ (N/2) 6 + N N/16Transposed direct DSP48E K ˆ N 6 NoneTransposed direct DSP48E1 K ˆ N 8 NoneTransposed direct DSP48E2 K ˆ N 6 NoneSymmetric transposed DSP48E K ˆ (N/2) 6 N ˆ WSymmetric transposed DSP48E1 K ˆ (N/2) 7 NSymmetric transposed DSP48E2 K ˆ (N/2) 6 NInterpolator non-symmetric direct DSP48E K ˆ N 5 + N NInterpolator symmetric transposed DSP48E1 K ˆ (N/2) 7 + N NInterpolator symmetric pairs DSP48E2 K ˆ N 6 + N K ˆ (N/2)

Table 4.2: Simulated results of utilized resources of K parallel N-tap FIR filters in DSP48E,DSP48E1 and DSP48E2.

The generated results presented above applies to all Filter structures in table 4.2 where thenumber of filters are k parallel N-tap FIR filter.

23

5 Discussion

In this section, the results obtained will be discussed and analyzed as well as the methodologyused in this work. Finally, the work is going to be seen in a wider perspective where theethical and environmental effects of the manufacturing process of electronics are discussed.

5.1 ASIC vs FPGA

When designing a system, it is of importance to decide in advance which integrated circuit isgoing to be used. ASIC and FPGA based circuits have many characteristics in common, butalso many differences. A brief introduction of the most common differences will be presentedbelow.

5.1.1 Benefits of an ASIC Based Circuit

One of the most important aspects when designing a system is its performance. ASIC pro-vides approximately 3.2 higher performance than the FPGA according to a comparison studyof more than 20 different designs [2].

Another benefit of using ASIC instead of FPGA is the low power consumption associatedwith ASIC based systems. An FPGA contains a lot of logic in order to cover the implemen-tation of different functions, which in turn lead to more power consumption than an ASICbased design. Although FPGA designs aimed for low power consumption customers exist,most are not. [2].

Conceptually, the most important feature offered by an ASIC based design and what makesit more preferable than an FPGA based design is its flexibility. In spite of the fact that themodern FPGAs contain dedicated blocks, these blocks are designed to be very general inorder to perform many different functions. The drawback of having blocks for general usemake the FPGA less desired for many users. In contrast, an ASIC can be designed by auser to implement any circuit as well as implementing self-specialized blocks for the desiredapplication [2].

24

5.2. Results

5.1.2 Benefits of an FPGA Based Circuit

In addition to ASIC, An FPGA also contains considerable advantages. An FPGA can be usefulwhen designing a prototype where test and evaluation process can be performed on parts ofthe design, even when the design is not fully completed [2]. In contrast, manufacturing partsof an ASIC will be expensive.The low cost of the FPGA device makes it available even for hobbyists to start using theFPGA. In contrast, the tool cost for ASIC is much more than the FPGA and can therefore notbe affordable by many users.An FPGA provides configurability through the big amount of embedded logic. The logic canbe used to design any desired circuit. However, the availability of logic lead to that someof the logic will not be used which in turn can lead to lower performance and high powerconsumption compared to an ASIC design. The self-configurability feature in an FPGA canhelp the designer to keep the design private and not hand it over to an outside party. Thisfeature gains more interest when dealing with designs that contain confidential informationsuch as cryptographic codes [2].

5.2 Results

The results are going to be discussed in the following sections.

5.2.1 Pipelined Direct Form vs Transposed Direct form

The results presented for single FIR filter and input sharing of FIR filter show that the utilizedDSP slices for mapping a FIR filter on FPGA is the same for direct and transposed direct formFIR filter structure. However, they have different clock cycle latency. Another factor thatmight be of interest in analyzing the direct and transposed structure are their throughput.In transposed structure, the input chain sends data from the input to the output with nodelay element in-between which in turn can lead to speed challenges. However, the latencyof direct form FIR filter increases by increasing filter taps (as shown in table 4.1) while thelatency in transposed direct form is constant.

As the focus of this work is primarily on resource utilization, the transposed form structureis still preferred when the number of utilized DSP slices for mapping a FIR filter on FPGA isthe same as in the corresponding direct form structure. It should be noted that both directand transposed structures do not use Look-Up table logic in the implementation of the FIRfilter which in turn saves a lot of resources.

5.2.2 Resource Utilization in Direct Form DSP48E vs. DSP48E1 and DSP48E2

When dealing with symmetric coefficients in pipelined direct form structure, the latency inthe three DSP slice types have approximately the same latency. However, it is of importanceto shed light on resource utilization differences between DSP48E on one hand, DSP48E1 andDSP48E2 on the other hand.

Although symmetric coefficients utilize the same amount of DSP slices in all three structures,the utilized LUT logic is different. Symmetric pipelined direct form FIR filter in DSP48E slice,as shown in figure 4.5, implement the chain input as well as the adder in the LUT logic whichin turn lead to significant resource utilization and power dissipation. In contrast to DSP48Eand as the use of symmetric coefficients increased by users, symmetric pipelined direct formFIR filter in DSP48E1 and DSP48E2 have traded-off DSP slices (as shown in figure 4.6) byinserting the pre-adder and the chain input delay into the DSP slice while the logic has been

25

5.3. Method

changed from the LUTs as an operation mode into SRL16E. The 16-bit shift register SRL16Eis used to improve performance and area efficiency.

5.2.3 Input Sharing in FIR Filter with ASIC Perspective

In the single FIR filter, mapping a DSP slice into FPGA is considered beneficial by efficientutilization of the area. However the implementation of input sharing FIR filter can be lesseffective in some cases.In figure 4.9 as an example, input sharing of two pipelined direct form FIR filters is shown.Each DSP slice can be thought of as a closed box where it is prohibited to reach the signalsinside this box. By assuming that the two pipelined direct form FIR filter are replaced by aten FIR filter of the same type, it would be less practical to use the direct form structure asthe delay elements of the shared input can not be used to reduce area utilization. In contrastto FPGA, ASIC can be used to implement the two pipelined direct form FIR filter shown infigure 4.9 as illustrated in figure 5.1 where the input delay chain is shared between the twofilters resulting in effective utilization of resources.

X[n] T Tg[0]

h[0]

g[1]

h[1]

T y[n]

T y[n]

Figure 5.1: Mapping of two pipelined direct form FIR filter in ASIC.

Symmetry in pipelined direct form FIR filter, as shown in figures 4.10 and 4.11, use LUTslogic in addition to DSP slices. In parallel implementation of a FIR filter, the input sharingis practical because of the savings of resources. However, the DSP slices in DSP48E on onehand, DSP48E1 and DSP48E2 on the other hand use logic differently. As it is preferred to usethe DSP48E1 and DSP48E2 approach in single FIR filter as discussed earlier, it is obvious thatthe approach might not be as good when using many filters in parallel. Therefore, the morenumber of a FIR filter the better it is to not use the built-in registers in order to save power.

Symmetry in transposed direct FIR filter is not included in Xilinx Core Generator. However,the symmetry of transposed FIR filter in ASIC can be obtained as shown in section 2.3.1.

5.3 Method

Traditionally, the FIR filter structures are viewed as adders, multipliers and delay elementsthat are connected in series when they are mapped in hardware. This is partially true but notthe wholly true because the target device in implementing the FIR filter determines the viewof how to look at the structure.

In ASIC, FIR filter resources are counted as adders, multipliers and delay elements. In con-trast, the counting of FPGA is done by determining the number of DSP slices and Look-Up

26

5.4. Ethics of Manufacturing Companies

tables. As discussed earlier, there are difference in calculating resources as adders, multi-pliers and delay elements than calculating DSP slices. The reason is that an ASIC baseddesign offers more flexibility in reaching the required signal and thus designing the desiredapplication while it is not the case for an FPGA because of the pre-construction of the DSPslices by the FPGA manufacturer.

FIR filter structures generator Core Generator by Xilinx does have some limitations such asmissing an option of utilizing symmetry in transposed direct form as well as sharing input ofseveral FIR filter structures. Efforts has been made to think individually how it would lookslike using symmetry in this case. Taking the transposed direct structure in figure 2.6 as anexample, the filter symmetry can be obtained by implementing the adder chain placed in thebottom of the structure in the LUTs logic instead of DSP slice.

The work presented in this thesis has some deficiency. There is a factor of difference in clockcycle latency between the generator results and the data sheets of the three DSP slice typesthat the author has not figured out the reason behind. If the number of clock cycle latencyin generator results has matched the corresponding latency in data sheets, the results wouldhave presented clearly and precisely.

It should also be mentioned that this work does not contain all the possible scenarios thatcould affect the resource utilization in implementing the FIR filter, such as decimation of theFIR filter.

As long as the target device manufacturer is Xilinx and the results are taken from the genera-tor created by the same manufacturer, the results provide high quality and should be consid-ered reliable. Thus, the results would be the same if this work is done again using the samemethodology discussed in this work.

5.4 Ethics of Manufacturing Companies

Since the arise of the technology in the late 1980’s the world has changed dramatically, andtechnology is so dominant in our daily lives. Electronics and more specifically integrated cir-cuits such as FPGAs have played a key role in creating and developing systems and allowingmore people to engage in technology. In order to understand the process of manufactur-ing electronics, a closer look is taken to analyze how electronic manufacturing companiesproduce products and how they handle ethical issues. The analysis process is based on as-sessment schedule made by [5] and the precautionary principle.

5.4.1 Assessment Schedule

There are many stakeholders in a manufacturing company including customers, suppliersand owners that play an important role in driving a company. It is obvious that most ofthe time the customers need a functionable products that serve their needs, no matter howenvironment-friendly the products are, and at the same scale, almost all owners care onlyabout how much money they can make. The relation between the company’s behavior andcustomers’ needs can be changed by raising awareness among customers about the impor-tance of taking a part by forcing the companies to change their morals and conditions so thatit matches customers’ needs and not exposing them into danger by using such products.

A similar example would be when the car manufacturer Volkswagen cheated in gas ex-hausters by manipulating software to read false data than the actual tests would give. Whenpeople figured out Volkswagen’s cheat, they stopped using these cars and the company hadto change their strategy by introducing better cars. According to utilitarianism, the producing

27

5.5. Source Criticism

of electronic circuits benefit the majority of people that use these products and therefore theydo not care about small consequences. Unlike utilitarianism, the deontological ethics con-sider the act instead of the consequences. For instance, an employee follows their boss ordersbecause it is his duty to do so. New solutions could be presented, such as making strict regu-lations on extraction of raw materials, increasing rates on non-environmental power sources.It is applicable if there are political willpower [5].

5.4.2 Precautionary Principle

Precautionary principle based on scientific certainty is crucial and all companies should there-fore take it into consideration to benefit the whole society, for instance if we take it frompower consumption point of view, we can efficiently consume power and it would lead toreducing greenhouse effects. Regulations on scientifically proven high quality raw materialsand the use of them lead to reducing of bad consequences that would harm humans and so-ciety. These raw materials would be cleaner and environment friendly when recycling them.Strong precautionary principle also applies on security issues in form of insulting humanrights, disease causing and integrity insulting by selling products that is not fully safe andnot mentioning the bad consequences that could possibly happen [5].

5.5 Source Criticism

The sources used in this work is a combination of books, datasheets and Ph.D. theses. Thebook authored by Hakan Johansson and Lars Wanhammar [10] at Linkoping university ispublished in 2011. Hakan Johansson is a professor that works in Linkoping university andhas authored several books that discuss linear systems, digital filters and more. Thus, thesource [10] should be reliable. The source [5] is written by Sven Ove Hansson who is aprofessor in philosophy in Royal Institute of Technology (KTH) in Stockholm. The book ispublished in 2008 and considered new. Hansson has published approximately 30 books inrelative areas to the one used in this work. The KTH reputation and the author positiongives him trustworthiness. The use of the articles [9], [6] and [7] are taken from IEEE websitewhich is the world’s largest professional organisation in technology and electronics field.Furthermore, the source [7] has been cited by three other papers which gives it validity.

Ph.D. theses have also been used. They were done in well-know universities in the worldand are relatively new which are considered a positive sign.

The majority of information in this work comes from the official data sheets and white paperspresented by Xilinx. The data sheets of DSP slice types are well-known in industry and comesfrom Xilinx which itself invented the FPGA circuit.

28

6 Conclusion

It has been shown that the different types of DSP slices have different features that are spe-cially designed for FIR filter implementation. The utilized resources are different accordingto filter structures which lead to different trade-offs between area, speed and power dissipa-tion.

FPGA based system designs and ASIC based system designs are two two different systems.There is a common misconception between users when dealing with these two different sys-tems. The resources of an FPGA are DSP slice-based where each DSP slice is considered aclosed block where its internal signals are unreachable. In contrast, ASIC resources consistof adders, multipliers and delay elements. Access on ASIC signals can be reached almostanywhere in the circuit

6.1 Future Work

According to the limited time that is set for this work to be done, this thesis discussed onlyone manufacturer. It would be of interest to cover more manufacturers in order to determinewhich manufacturer’s approach implement the most effective FIR filter with respect to thescenarios presented in this work.

29

Bibliography

[1] Matthew John Dallmeyer. “Reducing FIR Filter Costs: A Review of Approaches as Ap-plied to Massive Fir Filter Arrays”. PhD thesis. University of Dayton, 2014.

[2] Andreas Ehliar. “Performance driven FPGA design with an ASIC perspective”. PhDthesis. Linköping University Electronic Press, 2009.

[3] Umer Farooq, Zied Marrakchi, and Habib Mehrez. “FPGA architectures: An overview”.In: Tree-based Heterogeneous FPGA Architectures. Springer, 2012, pp. 7–48.

[4] LogiCORE IP Product Guide. “FIR Compiler v7.2”. In: ().

[5] Sven Ove Hansson and Kalle Grill. Teknik och etik. KTH: s filosofienhet, 2008.

[6] Mahesh Kadam, Kishor Sawarkar, and Sudhakar Mande. “Investigation of suitableDSP architecture for efficient FPGA implementation of FIR filter”. In: Communication,Information & Computing Technology (ICCICT), 2015 International Conference on. IEEE.2015, pp. 1–4.

[7] Ying Li, Chungan Peng, Dunshan Yu, and Xing Zhang. “The implementation methodsof high speed FIR filter on FPGA”. In: Solid-State and Integrated-Circuit Technology, 2008.ICSICT 2008. 9th International Conference on. IEEE. 2008, pp. 2216–2219.

[8] Uwe Meyer-Baese and U Meyer-Baese. Digital signal processing with field programmablegate arrays. Vol. 2. Springer, 2004.

[9] J Selvakumar, Vidhyacharan Bhaskar, and S Narendran. “FPGA based efficient fast FIRalgorithm for higher order digital FIR filter”. In: Electronic System Design (ISED), 2012International Symposium on. IEEE. 2012, pp. 43–47.

[10] Lars Wanhammar and Håkan Johansson. Digital filters using Matlab. Department ofElectrical Engineering, Linköping University, 2011.

[11] UG073 Xilinx. XtremeDSP for Virtex-4 FPGAs User Guide. 2007.

[12] UG193 Xilinx. Virtex-5 FPGA XtremeDSP Design Considerations. 2017.

[13] UG479 Xilinx. 7 Series DSP48E1 Slice User Guide. 2018.

[14] UG579 Xilinx. “UltraScale Architecture DSP Slice User Guide”. In: ().

30

fir filter features on fpga - diva portal

Documents