alberto qunchaor

Upload: uday-kumar

Post on 03-Jun-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/13/2019 Alberto QunchaOr

    1/4

    Signal Processing Implementations Using Simulink

    Alberto Quinchanegua, William SnchezAdvisor: Domingo Rodrguez

    Electrical and Computer Engineering DepartmentUniversity of Puerto Rico, Mayagez Campus

    Mayagez, Puerto Rico [email protected], [email protected]

    Abstract

    This document presents a methodology for theimplementation of signal processing applications basedin a Kronecker product language to map a mathematicalformulation in an algorithm formulation, andSimulink, which is a dynamic system and simulation

    software based in MATLAB for modeling and highlevel simulation. Simulinkadds many features specificto dynamics systems while retaining all MATLABgeneral purpose functionality. A mathematicalformulation can be expressed in terms of functionalmodules with defined input and output ports that can beinteractively interconnected using Simulink in order toreduce the algorithm development and implementationtime-line. Two important signal processing applicationssuch as FIR filters and fast Fourier Transform aretreated here as systems, that were implemented inhardware using a system generator toolbox, whichtranslate a Simulink model in a hardware description

    language (HDL) for FPGA implementations.

    1. Introduction

    Simulinkis a dynamic system and simulation softwarethat offers an interactive scientific and engineeringenvironment for system modeling, analysis, andsimulation. This environment is useful for the rapidimplementation of a digital signal processingapplication in terms of functional blocks, and providesa high-level simulation. A functional block is a basicstructure that can represent a function, or a specializedsystem, with defined input and output ports andcustomized parameters. In this work we concentrated in

    the implementation of digital signal processingalgorithms using block diagrams for the data flowmodeling, and algorithm testing and implementation.For this purpose, we introduce a methodology for theDSP algorithm development and hardwareimplementation, based in a Kronecker productlanguage, in order to assist to a developer in thegeneration of hardware prototype systems. We definean algorithm development and implementationenvironment as an aggregated of the following tools:

    Simulink, DSP microprocessors, and FPGAcomputing structures. We first introduce themathematical formulations for two importantapplications in digital signal processing, such as FIRfilters, and the fast Fourier transform; thus we proceedto map the mathematical formulations in algorithmsexpressed as interconnected functional blocks that can

    be translated in C code for DSP microprocessorimplementations, or in VHDL code using a SystemGenerator for FPGA implementations. Hardwareimplementation results of our methodology are

    presented using a Xilinx Virtex-II FPGA computingunit.

    2. Modeling DSP Applications

    Simulink uses a set of libraries for signal processingto represent dynamic systems. The standard blocklibrary is organized into several subsystems, grouping

    blocks according with the behavior and it contributeswith the design of new blocks by a developer, forincrease the modularity of the applications. On theother hand, the reconfigurability of a developed modelis not complicated because the cut, copy, andpaste commands are available to include blocks bysoftware. The Figure 1 shows an example of aSimulinkmodel built with functional blocks.

    Figure 1.An example of a Simulinkmodel composedof interconnected functional blocks.

  • 8/13/2019 Alberto QunchaOr

    2/4

    In order to introduce the methodology for the modelgeneration and the implementation of DSP applicationswe introduce the mathematical formulations for FIRfilters and the FFT, using a Kronecker productslanguage, and then we present the procedure todescribe an algorithm in terms of functional blocks fortesting, simulation, and code generation for hardwareimplementation.

    2.1 Mathematical Formulations

    2.2.1 FIR Filters

    A simplest FIR system or operator, apart from thetrivial system, i.e., the system represented by the

    identity operator IN, is the system{ }1h

    T T

    = , which

    turns into the shift operator SN . This system is

    sometimes called the unit delay system because itsdigital electronics hardware implementation may be

    accomplished by using a single delay element, which

    acting on the unit sample signal { }1 results in

    { } { }[ ]T j S S S

    j ZN

    j

    N

    N

    1 1

    1= = = . The matrix

    representing the shift operator SN is obtained byallowing the vector representation (with respect to the

    standard basis set { }{ }k Nk Z: ) of the signal

    { } { }{ }T k Zk N 1 , , become the columns of thematrix. An important property of the SNoperator matrixis that any cyclic FIR system T

    hmay be represented by

    a matrixHN which can be written as a sum of powersof the matrix SN pre-multiplied by a diagonal matrix

    [ ]D

    h j, as follow:

    [ ] [ ]( )

    ==N NZj Zj

    jN

    jNjhN SjhSDH . (1)

    2.2.2 Fast Fourier Transform

    The Fourier matrix can be factorized in terms of sparsematrices represented by Kronecker products factors,which play an important role in the development and

    implementation of Fourier Transform algorithms. Forexample, the traditional Cooley-Tukey algorithm forthe fast computation of the Fourier Transform can berepresented in terms of Kronecker products factors asfollows:

    ( ) ( ) RNSRSNSRN PFITIFF ,, = (2)

    where N is the size of the FFT, N=R.S, and IN is anidentity matrix. In this case, N is a power of two. A

    new variant for the expression of the matrix NF is

    derived using Kronecker product properties in order tocompute a N-Point FFT, where Ncan be a non-powerof two. The new mathematical formulation can bewritten as follows:

    ) ( ) SNpSLpT

    N PFUITIFUF ,222,222 = (3)

    where pN 4= , and p is a prime. The expressions (2)

    and (3) allow us to identify functional primitives orbasic structures that compose an algorithm. Thesefunctional primitives can be modeled as blockdiagrams, using the mathematical expression to definethe data flow for the algorithm computation. Forexample, consider the Kronecker factor

    222 FUI p that represents a first stage in the

    FFT computation. Graphically we can determine thedata flow generated by the action of the Kroneckerfactor over an input vector of length 4p, being p=1;

    then:

    ( ) xFxFUI

    = 2222 1

    1

    10

    01 (4)

    ( ) x

    F

    F

    F

    F

    xFUI

    =

    2

    2

    2

    2

    222

    0

    0

    0

    0

    (5)

    where

    =11

    112F .

    The expressions (4) and (5) below represent the actionof matrices 2F over segments of the input vector x ,

    where the operations over a same segment areduplicated, generating a data scattering stage. Thisscattering represents a parallel processing stage that

    permits to organize the stages of Twiddle factorspresent in the traditional Cooley-Tukey algorithm in asingle parallel stage of Hadamard products. A diagonal

    matrix SLT , contains the twiddle factors. The same

    procedure explained below is carried out to identify the

    dataflow generated by the factor ( )pT IFU 222 ,which is translated in a final stage of data gathering.

    2.2. Functional Blocks

    2.2.1 FIR Model:

    In a real life a designer has to do the mapping from themathematical representation of a FIR filter, to symbolicrepresentation in Simulink. From the mathematicalformulation previously presented in the expression (1),

  • 8/13/2019 Alberto QunchaOr

    3/4

    the designer can observe the implicit mathematicaloperations such as Add, Multiplication, and Delay andtranslate these in Simulinkfunctional blocks. For a 4-tap FIR filter implementation, Simulink offers theDelay Blocks, Multiplication Blocks, and AdditionsBlocks that the designer can interconnect to obtain theFIR filter structure for simulation and implementationas shows the Figure2.

    Figure 2. Simulink model of a 4-tap FIR Filter.

    Notice in the figure below that the FIR filter structurehas a repetitive structure conformed by a delay block, amultiplication block, and an adder block as shows theFigure3. After the model simulation, the designer canmake new functional blocks, creating new librarieswith defined input and output ports, to be used in futureapplications.

    Figure 3.Identification of a new functional block forthe implementation of a FIR filter.

    2.2.2. 4p-FFT Model

    The 4p-FFT algorithm can be represented in Simulink,

    in terms of interconnected fuctional blocks for analysisand simulation. Each functional block is associatedwith a functional primitive, simulating the algorithmdata flow in every stage of the FFT computation. Theinitial stage of the FFT can be implemented as a

    butterfly stage followed by a scattering stage as shows

    the Figure 4. A functional block representing thedataflow for the butterfly is shown in the Figure 5.

    -1

    a

    b

    a+ b

    a- b

    a+b

    a+b

    a-b

    a-b

    Figure 4.Flow graph of a Butterfly followed by a

    scattering stage.

    a

    ba+b

    a

    ba-b

    In_Port_1

    In_Port_2

    Out_Port_1

    Out_Port_2

    Figure 5.Butterfly functional block for theimplementation of the factor 222 FUI p .

    The others stages of the FFT computation are theHadamard product or point-by-point multiplications,and the data gathering. Thus, the complete 4p-FFT

    algorithm can be modeled as shows the Figure 6.

    DATA

    SCATTERING

    HADD

    AMARD

    PRODU

    CTWITH

    TWIDDLE

    FACTORS

    D

    ATA

    GATHERING

    4p-FFT Dataflow Model

    INPUT OUTPUT

    Figure 6.Data flow model for the 4p FFT computation.

    3. Hardware Implementations

    This section presents preliminary results of the FPGAhardware implementation of a 4-tap FIR filter, and anew variant of the traditional Cooley-Tukey FFTalgorithm for lengths that are not necessarily power oftwo. The FIR filter previously presented in theFigure 2 was implemented in a Virtex-II FPGA using aSystem Generator toolbox for the generation of VHDLcode. The implementation results of the FIR filter arecompared with a previous implementation of the same4-tap FIR filter in a TI-C6711 DSP microprocessor and

    summarized in the Table 1.

  • 8/13/2019 Alberto QunchaOr

    4/4

    Table 1.Hardware implementation resultsof a 4-Tap FIR filter.

    ITEM Virtex-IIFPGA

    TI-C6711 DSP

    -Processor.Data Bus

    Input1~n* Bits

    Fixed Point32 Bits Floating

    Point.

    Data BusOutput

    1~n* BitsFixed Point

    32 Bits FloatingPoint.

    System ClockPeriod

    10 nanoseconds(max. 100

    MHz)

    6.7 nanoseconds(150 MHz)

    Latency 1 clock cycle** 49 clock cycles

    Simulation Logic Devices,Bit stream.

    Registers, Bytes,Words.

    ProgrammingLanguage

    VHDL C, assembly

    Scalability Data BusWidth

    Fixedarchitecture.

    * n32 using Xilinx Logic Cores.** Initial latency: 11 clock cycles.

    The 4p-FFT algorithm is based in a Kroneckerformulation that permits to organize the stages ofTwiddle factors present in the traditional Cooley-Tukeyalgorithm in a single parallel stage of Hadamard

    products, reducing the latency time generated by everymultiplication stage. The algorithm was simulated inSimulink and the model translated in a hardwaredescription language (VHDL) using a System

    Generator Core for FPGA implementation. The Figure7 shows a block diagram of the developed hardware

    structure composed of interconnected functionalblocks.

    HADAMARD

    Z-1

    ADD

    &

    SUB

    Z-2

    ADD

    CONTROL

    F

    I

    F

    O

    SCATTERING GATHERING

    ADD

    &

    SUB

    4p-FFT CORE

    N=4p

    Fixed Point

    16-BIT

    COMPLEX

    xA

    16-BIT

    COMPLEX

    xB

    32

    32

    16-BIT

    COMPLEX

    X[k]

    X[0]

    X[1]

    X[2]

    X[N-1]

    32

    32

    32

    32

    1

    2

    3

    N/2

    Figure7Block diagram of the 4p- FFT hardwarestructure.

    The 4p-FFT model was implemented in hardware usinga Xilinx Virtex-II FPGA and the results are presentedin the Table 2.

    Table 2.FPGA implementation results of the4p-FFT model.

    N-

    Complex

    Point

    Slice

    Consumption

    Initial

    Latency

    Time

    (Cycles)

    System

    Clock

    Period

    8 1198 16 10.287ns

    12 2516 19 10.287ns16 4165 22 10.287ns

    4. Conclusion

    We presented a methodology for the formulation ofcomputing methods for digital signal processing, basedin a Kronecker products decomposition technique thatallow us to map a computing method in the form ofalgorithm formulations, and to find new variants toadapt the algorithm to specific hardware computingarchitecture. We represented the algorithms in termsof functional blocks for modeling and simulation usingSimulink. The modeled DSP applications wereimplemented in a FPGA using a system generator thattranslated a Simulink model in a Hardware descriptionlanguage for FPGA implementation. Also, aDevelopers Kit for Texas Instruments toolbox can beused to generate machine code for a TI-C6000 DSPProcessor from a Simulinkmodel.

    References[Johnson90] J. R. Johnson, R. W. Johnson, D. Rodrguez, R.Tolimieri, A Methodology for Designing, Modifying, andImplementing Fourier Transform Algorithms on VariousArchitectures,IEEE Transactions on Circuits, Systems, andSignal Processing, Vol. 9, No. 4, 1990.

    [Rodriguez02] D. Rodrguez, A. Quinchanegua, H. Nava.Signal Operator Cores for SAR Real Time ProcessingHardware, IEEE IGARSS 2002, Toronto, Canada, July2002.

    [Rodriguez91]Domingo Rodriguez, A new FFT algorithmand its implementation on the DSP96002, ICASSP 91, Vol.3, pp. 2189-2192, May 1991.

    [Rodriguez91] Domingo Rodriguez, Tensor product algebraas a tool for VLSI implementation of the discrete Fouriertransform, ICASSP 91, Vol. 2, pp.1025-1028:May 1991.

    [Reyneri01] L.M. Reyneri, F. Cuccinotta, A. Serra, L.Lavagno. A Hardware/Software Co-design Flow and IP

    Library Based of SimulinkTM

    , Design AutomationConference, pp. 2101-2106, Vol. 4. 2001.

    [Xilinx03] Xilinx, Inc, http://www.xilinx.com

    [Math03] The Mathworks, Inc, http://www.mathworks.com

    [TI03] Texas Instruments Inc, http://dspvillage.ti.com