`hardware implementation of parallel adder/subtarctor and
TRANSCRIPT
International Journal of Technology, Management & Knowledge Processing
6
3 2 1 , 0
`Hardware Implementation of Parallel adder/Subtarctor and
Complex Muliplier using Xilinx IP-Core
B. Khaleelu Rehman1. Waaiz Mohammad2. Mudasar Basha3. Salauddin Mohammad4
1Dept of ECE, Nalla Malla Reddy Engineering College, Hyd, India. [email protected] 2Physics, Govt College for Men, Kurnool, India. [email protected] 3Dept of ECE, BVRIT, Hyd. India. [email protected] 4Dept of ECE, J.B Institute of Engineering &Technology, Hyderabad .India. [email protected]
Abstract---The paper aims to target the Xilinx intellectual property
(IP) cores and the methodology that allows in the easy way of
implementing the IP cores and its functionalities and the interface
with the recent Xilinx FPGA’s. The proposed work is developed
with Xilinx ISE 14.7 programming and the IP cores associated with
it. VHDL programming style is used to describe the hardware and
its functionality. Complex multiplier and N-bit parallel
adder/subtrator is designed using Xilinx IP core approach and
implemented using Virtex-5 XC5VXT50T device.
Keywords— FPGA, VHDL, Xilinx IP, Virtex-5, ISE
I. Introduction
Complex multipliers used wide applications in Digital signal
processing like Discrete Fourier transforms and Multiply and
accumulate (mac). The speed of the processor mainly
depends on the speed of the complex multipliers used. N- bit
parallel adder/subtractor, adds the 2 inputs A(Augend) and
B(Addend) and also subtract the 2 inputs A(Minuend) and
B(Subtrahend) as shown in figure 1.1.FA1 is the first full
adder. Similarly, FA2 is the second full adder and FAn is the
nth full adder. Ci1 is the initial carry of the first full adder .S1,
S2………, Sn-1, Sn, are the sum of the respective full
adders.CO1, CO2……… COn-1, COn,are the carry of the
respective full adders. Here P-line is the control signal i.e
depending on the input ‘p’ the circuit behaves as an adder and
the subtractor. If p control input is ‘o’, then the Ex-or gate for
the input B1 behaves as the adder and if the p-value is ‘1’ then
the circuit behaves as the subtractor.
If one input to the ex-or gate is ‘0’ then the output of the ex-
or gate will be input itself i.e 0⊕A=A; if input A is ‘0’ then
the output will be ‘0’ similarly if input A is ‘1’ then the output
will be ‘1’ which acts as an adder as discussed above
similarly If one input to the ex-or gate is ‘1’ then the output
of the ex-or gate will be complemented input i.e 1⊕A=A’; if
input A is ‘0’ then the output will be ‘1’ similarly if input A
is ‘1’ then the output will be ‘0’ which acts as the subtractor.
Let us take an example of the 4-bit adder/subtractor. One
input A for the 4-bit number is A3, A2, A1, A0, and another
input B for the 4-bit number is B3, B2, B1, B0,
S3, S2, S1, S0 are the sum of the full adders B ’, B ’
, B ’ B ’ are
the inverted inputs, and C04 is the carry obtained from A3− B3.
By using 2’s complement notation, the subtraction of 4-bit numbers is obtained
Fig 1.1 N-bit parallel adder/subtrator adder block diagram
VOLUME 1, ISSUE 1, AUG 2021
International Journal of Technology, Management & Knowledge Processing
7
As discussed above, the implementation of an N-bit
parallel/adder can be obtained. To design a 64-bit parallel
adder/subtractor, the designer requires 64 full adders which
are connected in a cascaded manner. For example, the delay
of the first full adder is 8ns, then to generate the 64th Full adder,
it requires 8ns×64=512ns to generate the final output.
II. Literature Review
Many researchers used N-bit parallel adder/subtract for
designing athematic and logical circuits. Single precision
floating-point notation and double-precision floating point
notations are used for adding the unsigned numbers. Carry
select adder provides the optimal results. Device utilization
summary and timing analysis are carried out by the
researchers [1]. Binary Integer Decimal (BID) encoder
method is used for adding decimal floating-point
numbers[2].Virtex-5 FPGA hardware kit is used for
experimental purposes and a synthesis report is discussed. 64-
bit adder is designed using Ripple carry adder and carry look-
ahead adder with different specifications[3] Spartan-7 FPGA
kit is used.64 bit and 32-bit adder is designed by using
pipelining approach [4] area, power and timing analysis are
analyzed along with the back end design. N-bit FPGA based
parallel adder/subtractor designed by different researchers
[5][6][7].
III. PROPOSED WORK
In the era of digital design, engineers choose the hardware
description language for describing any complex logic
function. For example, if the digital design engineer needs the
multiplier or complex multiplier for designing the complex
multiplier circuit each time they were doing the project, It
would be reinventing the wheel and wasting their time.
Similarly, if the design engineer wants to re-code it
continually and reuse the same code, it will be very difficult,
and one has to waste more time and money. One solution for
the above problem is using the Xilinx
IP(Intellectual property)[8] cores. An IP is a piece of HDL
code where the design engineers have already written to
perform a specific task and hence saving the designer’s time.
Figure 3.1 shows the Xilinx Complex multiplier[9] 3.1 IP
core to open the IP core; the following steps should follow
open the Xilinx ISE or Xilinx Vivado suite then click file then
click New project under the project give the name of the
project and specify the device(Virtex-5) details then click
IP(Core generator & architecture wizard) the figure shown in
3.1 will appear, many IP cores are available in it use the basic
elements under the basic elements complex multiplier is used
in the design.
Fig 3.1 IP core generation wizard
Figure 3.3 is the complex multiplier IP core 3.0. The figure
3.4 IP core shown below has two halves. The left side has the
IP symbol, which is the prototype of the RTL but not exactly
the RTL. The exact RTL can be observed after writing the
HDL code and after instantiating the counter IP core. Ar[7:0]
is the 8 bits real part input AI[7:0] is the 8 bits imaginary part
input Br[7:0] is the 8 bits real part of another input BI[7:0] is
the 8 bits imaginary part another input.’clk’ is the clock input.
Pr [16:0] is the 16 bits real output, and PI[16:0] is the 16 bits
imaginary output. Five inputs and two outputs are enabled,
remaining all the pins are disabled. Round_Cy is the carry
input to facilitate the unbiased
VOLUME 1, ISSUE 1, AUG 2021
International Journal of Technology, Management & Knowledge Processing
8
rounding “CE” is the clock enable. ”SCLR” is the
synchronous clear. Round_Cy, CE, and SCLR are disabled,
and all the inputs and outputs which are disabled are optional.
Fig 3.2 64-bit parallel adder/subtraction
Fig 3.3 Xilinx Complex multiplier IP core
Fig 3.4 Complex Multiplier IP core
The right side of the IP core has AR/AI operands. This
operand has one input width with a minimum of 8-bits and a
maximum of 64 bits. The maximum value of the operand,
both real and imaginary can be 256 for selecting 8 bits, and
the maximum value can be 264 bits. Similarly, it has BR/BI
operands. Using multiplier construction option 4 DSP slices
are used under the optimization goal if performance is
selected.3 DSP slices are used if resource sharing approach is
used. The designer has the option of choosing to Look up the
table also. The designer can choose the product output by
default for the 8-bit complex number multiplication. The real
part product will be 16 bits, and the imaginary part will be 16,
but it can be truncated.
To design a 64 bit parallel adder/subtractor the IP core is used
to add, subtract, and add/subtract 256 bit signed and unsigned
numbers. The parallel adder, parallel subtractor, and parallel
adder cum subtractor.
Figure 3.2 is the 64-bit unsigned parallel adder/subtractor.
The IP core shown above has two halves. The left side has the
IP symbol a prototype of the RTL but not exactly the RTL. The
exact RTL can be observed after writing the HDL code and
after instantiating the adder IP core. A[63:0] is the 64-bit
input augend/minuend , B[63:0] is another
input(addend/Subtrahend),’clk’ is the clock input. The 3
inputs shown are the enabled and remaining ports are
disabled the inputs are “ADD” which is used when we select
add/subtract module. “C_in” is the carry input.”CE” is the
clock enable.”SCLR” is the synchronous clear. “SSET” is the
Synchronous set.”SSIT” is the synchronous Init. The outputs
are S[64:0], sum/difference output of 65 bits enabled and
C_Out carry out disabled. All the inputs and outputs which
are disabled are optional. For any N bit parallel
adder/subtractor the add is the one bit input. If add is ‘1’ then
the addition of 64 bits are performed otherwise subtraction is
carried out. The right side has the component selection. Under
the component selection, the implementation type has 2
options, one Fabric and another DSP, through which Fabric
uses 65 LUTs and 65 Flip flops. DSP48 is not implemented
for a 64-bit adder/subtractor. The maximum subtraction is
allowed 47 bits with the use of 1 DSP48 processor inside the
inbuilt FPGA hardware.
The RTL schematic of the 8-bit complex multiplier IP core
block diagram is shown in figure 3.4. Xilinx generates .ngr
file for the RTL schematic.”clk” is the clock input, ar (7:0) is
the 8 bit input1 real part of the signal and ai(7:0) is the 8 bit
input1 imaginary part of the signal.br(7:0) is the 8 bit input
2 real part of the signal and bi(7:0) is the 8 bit input2
imaginary part of the signal.pr(16:0) is the real part of the
output signal, and Pi(16:0) is the imaginary part of the output
signal. Figure 3.4 is the prototype of the RTL, and figure 3.5
is the exact RTL
VOLUME 1, ISSUE 1, AUG 2021
International Journal of Technology, Management & Knowledge Processing
9
Fig 3.5 RTL view of 8 bit complex multiplier IP core
Fig. 3.6 RTL schematic of 64-bit parallel adder/subtractor IP
core
The RTL schematic of a 64-bit parallel adder/subtractor IP
core block diagram is shown in figure 3.6. Xilinx generates
.ngr file for the RTL schematic.”clk” is the clock input, a
(63:0) is one of the input augend/minuend to the
adder/subtractor and b(63:0) is another input to the
adder/subtractor (addend/subtrahend).add is the one bit input
if add is high then the above RTL acts as an adder and if add
is logic level zero then the above RTL acts as a subtract
s(64:0) is the output sum/difference. Figure 3.2 is the
prototype of the RTL and figure 3.6 is the exact RTL. From
the above discussion, it can be concluded that add is the
control input line, like ‘p’ in fig.1.1 if add is ‘1’ then the
addition of 64 bits are carried out; otherwise, subtraction is
carried out.
Fig. 3.7 Internal schematic of 64-bit parallel adder/subtractor
IP core
The internal schematic of 64 bit parallel adder/subtractor IP
core is shown in fig.3.7 by using the instantiation of the core
gen x(63:0) is mapped to a(63:0) and y(63:0) is mapped to
b(63:0).clk pin mapped with the same clk name signal. ’p’ is
mapped with add. s(64:0) output blocks are mapped with
sp(64:0)
The example of the complex number is illustrated
below with example
(ar+ai)(br+bi)
arbr+arbi+aibr+aibi
arbr+abi2+i(ar+br)
arbr-ab+i(ar+br) (Since i2= -1)
Fig. 3.8 8-bit Complex multiplier simulation using IP Core
VOLUME 1, ISSUE 1, AUG 2021
International Journal of Technology, Management & Knowledge Processing
10
Fig 3.9 Complex multiplier example
Figure 3.8 shows the ISIM simulation waveform of the 8- bit complex multiplier using IP Core. By default, Xilinx has the isim simulator, but the third party simulator like ModelSim the product of Mentor graphics [10], will also give the same simulation waveform
is 4 bi[7:0] is the 8-bit input imaginary value and its decimal
value is 5. and output is b[4:0] is the 5 bit output width its
binary value is “01111” and its decimal value is 15. From the
above figure, it can be observed that the square root of 255 is
15.The maximum integer square root for the 8 bit input data
is 255.
Figure 3.10 shows the ISIM simulation waveform of the 64-
bit parallel adder. By default, Xilinx has the isim simulator
but the third party simulator like modelsim[11] the product of
Mentor graphics will also give the same simulation waveform
.One input for the above fig 3.10 in the simulation is a[63:0]
the Hexadecimal its value “0000000000002000” its decimal
value is 8192. Similarly, the input b[63:0] and its
Hexadecimal value is “0000000000001000” its decimal
value is 4096. The output s[64:0] is the Hexadecimal
“000000000000003000” its decimal value is 12,288.clk is the
clock pulse of the rising edge of the input signal. If input add
is high logic level i.e. ‘1’ then the addition of two 64 bit
numbers are performed.
Figure 3.11 shows the ISIM simulation waveform of the 64-
bit parallel subtractor. One input for the below simulation is
a[63:0] the Hexadecimal its value “0000000000002400” its
decimal value is 9216. Similarly, the input b[63:0] and its
Hexadecimal value is “0000000000001200” its decimal
value is 4608. The output s[64:0] is the Hexadecimal
“000000000000001200” its decimal value is 4608.clk is the
clock pulse of the rising edge of the input signal. If input add
is low logic level i.e ‘0’ then subtraction of two 64 bit
numbers are performed. Figure 3.4 and 3.5 simulation
waveforms are with respect to unsigned decimals.
One input for the above fig 3.8 in the simulation is “clk” i.e
“clk” is the clock pulse of the rising edge of the input signal
ar[7:0] is the 8-bit input real value, and its decimal value is 3
ai[7:0] is the 8-bit input imaginary value and its decimal value
is 2. br[7:0] is the 8 bit input real value and its decimal value
Fig. 3.10 64 bit parallel adder simulation using IP Core
Fig. 3.11 64 bit parallel subtractor simulation using IP-
Core
Table 3.1 64 bit parallel adder/ subtractor IP core
synthesis report
The table 3.1 shows the synthesis report of the 64-bit parallel
adder/subtractor IP core. The number of slice registers, Slice
Look up tables, Fully used LUT-FFs pairs, Input output
blocks, and Buffer memories are 65,66,65,194,1.The
VOLUME 1, ISSUE 1, AUG 2021
International Journal of Technology, Management & Knowledge Processing
11
maximum combinational path delay for the parallel
adder/subtractor circuit is 2.826ns through which 2.540ns for
the logic, 0.286ns for the routing Total REAL time to
Xst(Xilinx synthesis Technology) completion is 21.00 secs,
and total CPU time to Xst(Xilinx Synthesis Technology)
completion is 21.02secs. Total memory usage is
4505144kilobytes.
Fig. 3.12 64 bit parallel adder/subtractor power report
The power report of the 64-bit parallel adder/subtractor is
shown in figure 3.12 The total power[11] occupied by design
is 0.560 W. The Xilinx generates .pcf file to create the power
report. To estimate the power dissipation Xilinx X power
analyzer software is required and the input to the xpower
analyzer is the .ncd file which means native circuit design. By
adding the .ncd file to the xpower analyzer it gives the power
report of the design. The power analysis software has the
synthesis report and as well as total on-chip power of the
design. Comparing the fig.3.6 and table 3.1 it can be observed
that there are 65 slice registers occupied by the design and the
same results are obtained in the power report. Out of 480 Input
output blocks available, only 194 I/O blocks are used in the
design. The clock input signal is the global clock pulse for the
design has the one clock input in Fig.3.6 and also in the Table
3.1.
Acknowledgement
The authors would like to thank the VLSI Design Lab,
Department of Electronics and Communication Engineering,
Nalla Malla Reddy Engineering College, Hyderabad, for
providing cooperation and laboratory facility to carry out our
research work.
References
[1] Parte, Ragini, and Jitendra Jain. "Analysis of Effects of using Exponent
Adders in IEEE-754 Multiplier by VHDL." 2015 International Conference
on Circuits, Power and Computing Technologies [ICCPCT-2015]. IEEE, 2015.
[2] Farmahini-Farahani, Amin, Charles Tsen, and Katherine Compton.
"FPGA Implementation of a 64-Bit BID-based decimal floating-point adder/subtractor." 2009 International Conference on Field-Programmable
Technology. IEEE, 2009.
[3] Reddy, Konda Sai Prakash. "Designing Various 64-bit Adders using VHDL in VIVADO." Journal of Innovation in Electronics and
Communication Engineering 9.2 (2019): 6-11. [4] Aarthy, M. "ASIC Implementation of 32 and 64 bit Floating Point ALU
using Pipelining." International Journal of Computer Applications 94.17
(2014). [5] Jaiswal, Manish Kumar, and Ray CC Cheung. "High performance FPGA
implementation of double precision floating point adder/subtractor."
International Journal of Hybrid Information Technology 4.4 (2011): 71-80.
[6] Vitoroulis, Konstantinos, and Asim J. Al-Khalili. "Performance of
parallel prefix adders implemented with FPGA technology." 2007 IEEE
Northeast Workshop on Circuits and Systems. IEEE, 2007. [7] Khan, Mozammel HA, and Marek A. Perkowski. "Quantum ternary
parallel adder/subtractor with partially-look-ahead carry." Journal of
Systems Architecture 53.7 (2007): 453-464. [8] Ghosh, Santosh, Ingrid Verbauwhede, and Dipanwita Roychowdhury.
"Core based architecture to speed up optimal ate pairing on FPGA platform."
International Conference on Pairing-Based Cryptography. Springer, Berlin, Heidelberg, 2012.
[9] Rao, K. Deergha, Ch Gangadhar, and Praveen K. Korrai. "FPGA implementation of complex multiplier using minimum delay Vedic real
multiplier architecture." 2016 IEEE Uttar Pradesh Section International
Conference on Electrical, Computer and Electronics Engineering (UPCON). IEEE, 2016. [10] Rangaraju, H. G., et al. "Low power reversible parallel binary
adder/subtractor." arXiv preprint arXiv:1009.6218 (2010). [11] Akhter, Shamim, Gaurav Raturi, and Shaheen Khan. "Analysis and
design of residue number system based building blocks." 2018 5th
International Conference on Signal Processing and Integrated Networks
(SPIN). IEEE, 2018.
IV Conclusion
The VHDL implementation of 64-bit adder/subtractor
using IP core generation and a complex multiplier is
performed. Its device utilization summary, RTL view,
simulation results, power report has been tested with
FPGA Virtex-5 XC5VXT50T [13] device, which works
on 65nm technology. Xilinx Isim is used for simulation
analysis. Xilinx 14.7 is used for synthesis, Place and Route.
The total power occupied by design is 0.56 W.
VOLUME 1, ISSUE 1, AUG 2021