alberto qunchaor
TRANSCRIPT
-
8/13/2019 Alberto QunchaOr
1/4
Signal Processing Implementations Using Simulink
Alberto Quinchanegua, William SnchezAdvisor: Domingo Rodrguez
Electrical and Computer Engineering DepartmentUniversity of Puerto Rico, Mayagez Campus
Mayagez, Puerto Rico [email protected], [email protected]
Abstract
This document presents a methodology for theimplementation of signal processing applications basedin a Kronecker product language to map a mathematicalformulation in an algorithm formulation, andSimulink, which is a dynamic system and simulation
software based in MATLAB for modeling and highlevel simulation. Simulinkadds many features specificto dynamics systems while retaining all MATLABgeneral purpose functionality. A mathematicalformulation can be expressed in terms of functionalmodules with defined input and output ports that can beinteractively interconnected using Simulink in order toreduce the algorithm development and implementationtime-line. Two important signal processing applicationssuch as FIR filters and fast Fourier Transform aretreated here as systems, that were implemented inhardware using a system generator toolbox, whichtranslate a Simulink model in a hardware description
language (HDL) for FPGA implementations.
1. Introduction
Simulinkis a dynamic system and simulation softwarethat offers an interactive scientific and engineeringenvironment for system modeling, analysis, andsimulation. This environment is useful for the rapidimplementation of a digital signal processingapplication in terms of functional blocks, and providesa high-level simulation. A functional block is a basicstructure that can represent a function, or a specializedsystem, with defined input and output ports andcustomized parameters. In this work we concentrated in
the implementation of digital signal processingalgorithms using block diagrams for the data flowmodeling, and algorithm testing and implementation.For this purpose, we introduce a methodology for theDSP algorithm development and hardwareimplementation, based in a Kronecker productlanguage, in order to assist to a developer in thegeneration of hardware prototype systems. We definean algorithm development and implementationenvironment as an aggregated of the following tools:
Simulink, DSP microprocessors, and FPGAcomputing structures. We first introduce themathematical formulations for two importantapplications in digital signal processing, such as FIRfilters, and the fast Fourier transform; thus we proceedto map the mathematical formulations in algorithmsexpressed as interconnected functional blocks that can
be translated in C code for DSP microprocessorimplementations, or in VHDL code using a SystemGenerator for FPGA implementations. Hardwareimplementation results of our methodology are
presented using a Xilinx Virtex-II FPGA computingunit.
2. Modeling DSP Applications
Simulink uses a set of libraries for signal processingto represent dynamic systems. The standard blocklibrary is organized into several subsystems, grouping
blocks according with the behavior and it contributeswith the design of new blocks by a developer, forincrease the modularity of the applications. On theother hand, the reconfigurability of a developed modelis not complicated because the cut, copy, andpaste commands are available to include blocks bysoftware. The Figure 1 shows an example of aSimulinkmodel built with functional blocks.
Figure 1.An example of a Simulinkmodel composedof interconnected functional blocks.
-
8/13/2019 Alberto QunchaOr
2/4
In order to introduce the methodology for the modelgeneration and the implementation of DSP applicationswe introduce the mathematical formulations for FIRfilters and the FFT, using a Kronecker productslanguage, and then we present the procedure todescribe an algorithm in terms of functional blocks fortesting, simulation, and code generation for hardwareimplementation.
2.1 Mathematical Formulations
2.2.1 FIR Filters
A simplest FIR system or operator, apart from thetrivial system, i.e., the system represented by the
identity operator IN, is the system{ }1h
T T
= , which
turns into the shift operator SN . This system is
sometimes called the unit delay system because itsdigital electronics hardware implementation may be
accomplished by using a single delay element, which
acting on the unit sample signal { }1 results in
{ } { }[ ]T j S S S
j ZN
j
N
N
1 1
1= = = . The matrix
representing the shift operator SN is obtained byallowing the vector representation (with respect to the
standard basis set { }{ }k Nk Z: ) of the signal
{ } { }{ }T k Zk N 1 , , become the columns of thematrix. An important property of the SNoperator matrixis that any cyclic FIR system T
hmay be represented by
a matrixHN which can be written as a sum of powersof the matrix SN pre-multiplied by a diagonal matrix
[ ]D
h j, as follow:
[ ] [ ]( )
==N NZj Zj
jN
jNjhN SjhSDH . (1)
2.2.2 Fast Fourier Transform
The Fourier matrix can be factorized in terms of sparsematrices represented by Kronecker products factors,which play an important role in the development and
implementation of Fourier Transform algorithms. Forexample, the traditional Cooley-Tukey algorithm forthe fast computation of the Fourier Transform can berepresented in terms of Kronecker products factors asfollows:
( ) ( ) RNSRSNSRN PFITIFF ,, = (2)
where N is the size of the FFT, N=R.S, and IN is anidentity matrix. In this case, N is a power of two. A
new variant for the expression of the matrix NF is
derived using Kronecker product properties in order tocompute a N-Point FFT, where Ncan be a non-powerof two. The new mathematical formulation can bewritten as follows:
) ( ) SNpSLpT
N PFUITIFUF ,222,222 = (3)
where pN 4= , and p is a prime. The expressions (2)
and (3) allow us to identify functional primitives orbasic structures that compose an algorithm. Thesefunctional primitives can be modeled as blockdiagrams, using the mathematical expression to definethe data flow for the algorithm computation. Forexample, consider the Kronecker factor
222 FUI p that represents a first stage in the
FFT computation. Graphically we can determine thedata flow generated by the action of the Kroneckerfactor over an input vector of length 4p, being p=1;
then:
( ) xFxFUI
= 2222 1
1
10
01 (4)
( ) x
F
F
F
F
xFUI
=
2
2
2
2
222
0
0
0
0
(5)
where
=11
112F .
The expressions (4) and (5) below represent the actionof matrices 2F over segments of the input vector x ,
where the operations over a same segment areduplicated, generating a data scattering stage. Thisscattering represents a parallel processing stage that
permits to organize the stages of Twiddle factorspresent in the traditional Cooley-Tukey algorithm in asingle parallel stage of Hadamard products. A diagonal
matrix SLT , contains the twiddle factors. The same
procedure explained below is carried out to identify the
dataflow generated by the factor ( )pT IFU 222 ,which is translated in a final stage of data gathering.
2.2. Functional Blocks
2.2.1 FIR Model:
In a real life a designer has to do the mapping from themathematical representation of a FIR filter, to symbolicrepresentation in Simulink. From the mathematicalformulation previously presented in the expression (1),
-
8/13/2019 Alberto QunchaOr
3/4
the designer can observe the implicit mathematicaloperations such as Add, Multiplication, and Delay andtranslate these in Simulinkfunctional blocks. For a 4-tap FIR filter implementation, Simulink offers theDelay Blocks, Multiplication Blocks, and AdditionsBlocks that the designer can interconnect to obtain theFIR filter structure for simulation and implementationas shows the Figure2.
Figure 2. Simulink model of a 4-tap FIR Filter.
Notice in the figure below that the FIR filter structurehas a repetitive structure conformed by a delay block, amultiplication block, and an adder block as shows theFigure3. After the model simulation, the designer canmake new functional blocks, creating new librarieswith defined input and output ports, to be used in futureapplications.
Figure 3.Identification of a new functional block forthe implementation of a FIR filter.
2.2.2. 4p-FFT Model
The 4p-FFT algorithm can be represented in Simulink,
in terms of interconnected fuctional blocks for analysisand simulation. Each functional block is associatedwith a functional primitive, simulating the algorithmdata flow in every stage of the FFT computation. Theinitial stage of the FFT can be implemented as a
butterfly stage followed by a scattering stage as shows
the Figure 4. A functional block representing thedataflow for the butterfly is shown in the Figure 5.
-1
a
b
a+ b
a- b
a+b
a+b
a-b
a-b
Figure 4.Flow graph of a Butterfly followed by a
scattering stage.
a
ba+b
a
ba-b
In_Port_1
In_Port_2
Out_Port_1
Out_Port_2
Figure 5.Butterfly functional block for theimplementation of the factor 222 FUI p .
The others stages of the FFT computation are theHadamard product or point-by-point multiplications,and the data gathering. Thus, the complete 4p-FFT
algorithm can be modeled as shows the Figure 6.
DATA
SCATTERING
HADD
AMARD
PRODU
CTWITH
TWIDDLE
FACTORS
D
ATA
GATHERING
4p-FFT Dataflow Model
INPUT OUTPUT
Figure 6.Data flow model for the 4p FFT computation.
3. Hardware Implementations
This section presents preliminary results of the FPGAhardware implementation of a 4-tap FIR filter, and anew variant of the traditional Cooley-Tukey FFTalgorithm for lengths that are not necessarily power oftwo. The FIR filter previously presented in theFigure 2 was implemented in a Virtex-II FPGA using aSystem Generator toolbox for the generation of VHDLcode. The implementation results of the FIR filter arecompared with a previous implementation of the same4-tap FIR filter in a TI-C6711 DSP microprocessor and
summarized in the Table 1.
-
8/13/2019 Alberto QunchaOr
4/4
Table 1.Hardware implementation resultsof a 4-Tap FIR filter.
ITEM Virtex-IIFPGA
TI-C6711 DSP
-Processor.Data Bus
Input1~n* Bits
Fixed Point32 Bits Floating
Point.
Data BusOutput
1~n* BitsFixed Point
32 Bits FloatingPoint.
System ClockPeriod
10 nanoseconds(max. 100
MHz)
6.7 nanoseconds(150 MHz)
Latency 1 clock cycle** 49 clock cycles
Simulation Logic Devices,Bit stream.
Registers, Bytes,Words.
ProgrammingLanguage
VHDL C, assembly
Scalability Data BusWidth
Fixedarchitecture.
* n32 using Xilinx Logic Cores.** Initial latency: 11 clock cycles.
The 4p-FFT algorithm is based in a Kroneckerformulation that permits to organize the stages ofTwiddle factors present in the traditional Cooley-Tukeyalgorithm in a single parallel stage of Hadamard
products, reducing the latency time generated by everymultiplication stage. The algorithm was simulated inSimulink and the model translated in a hardwaredescription language (VHDL) using a System
Generator Core for FPGA implementation. The Figure7 shows a block diagram of the developed hardware
structure composed of interconnected functionalblocks.
HADAMARD
Z-1
ADD
&
SUB
Z-2
ADD
CONTROL
F
I
F
O
SCATTERING GATHERING
ADD
&
SUB
4p-FFT CORE
N=4p
Fixed Point
16-BIT
COMPLEX
xA
16-BIT
COMPLEX
xB
32
32
16-BIT
COMPLEX
X[k]
X[0]
X[1]
X[2]
X[N-1]
32
32
32
32
1
2
3
N/2
Figure7Block diagram of the 4p- FFT hardwarestructure.
The 4p-FFT model was implemented in hardware usinga Xilinx Virtex-II FPGA and the results are presentedin the Table 2.
Table 2.FPGA implementation results of the4p-FFT model.
N-
Complex
Point
Slice
Consumption
Initial
Latency
Time
(Cycles)
System
Clock
Period
8 1198 16 10.287ns
12 2516 19 10.287ns16 4165 22 10.287ns
4. Conclusion
We presented a methodology for the formulation ofcomputing methods for digital signal processing, basedin a Kronecker products decomposition technique thatallow us to map a computing method in the form ofalgorithm formulations, and to find new variants toadapt the algorithm to specific hardware computingarchitecture. We represented the algorithms in termsof functional blocks for modeling and simulation usingSimulink. The modeled DSP applications wereimplemented in a FPGA using a system generator thattranslated a Simulink model in a Hardware descriptionlanguage for FPGA implementation. Also, aDevelopers Kit for Texas Instruments toolbox can beused to generate machine code for a TI-C6000 DSPProcessor from a Simulinkmodel.
References[Johnson90] J. R. Johnson, R. W. Johnson, D. Rodrguez, R.Tolimieri, A Methodology for Designing, Modifying, andImplementing Fourier Transform Algorithms on VariousArchitectures,IEEE Transactions on Circuits, Systems, andSignal Processing, Vol. 9, No. 4, 1990.
[Rodriguez02] D. Rodrguez, A. Quinchanegua, H. Nava.Signal Operator Cores for SAR Real Time ProcessingHardware, IEEE IGARSS 2002, Toronto, Canada, July2002.
[Rodriguez91]Domingo Rodriguez, A new FFT algorithmand its implementation on the DSP96002, ICASSP 91, Vol.3, pp. 2189-2192, May 1991.
[Rodriguez91] Domingo Rodriguez, Tensor product algebraas a tool for VLSI implementation of the discrete Fouriertransform, ICASSP 91, Vol. 2, pp.1025-1028:May 1991.
[Reyneri01] L.M. Reyneri, F. Cuccinotta, A. Serra, L.Lavagno. A Hardware/Software Co-design Flow and IP
Library Based of SimulinkTM
, Design AutomationConference, pp. 2101-2106, Vol. 4. 2001.
[Xilinx03] Xilinx, Inc, http://www.xilinx.com
[Math03] The Mathworks, Inc, http://www.mathworks.com
[TI03] Texas Instruments Inc, http://dspvillage.ti.com