`hardware implementation of parallel adder/subtarctor and

International Journal of Technology, Management & Knowledge Processing

6

3 2 1 , 0

`Hardware Implementation of Parallel adder/Subtarctor and

Complex Muliplier using Xilinx IP-Core

B. Khaleelu Rehman1. Waaiz Mohammad2. Mudasar Basha3. Salauddin Mohammad4

1Dept of ECE, Nalla Malla Reddy Engineering College, Hyd, India. [email protected] 2Physics, Govt College for Men, Kurnool, India. [email protected] 3Dept of ECE, BVRIT, Hyd. India. [email protected] 4Dept of ECE, J.B Institute of Engineering &Technology, Hyderabad .India. [email protected]

Abstract---The paper aims to target the Xilinx intellectual property

(IP) cores and the methodology that allows in the easy way of

implementing the IP cores and its functionalities and the interface

with the recent Xilinx FPGA’s. The proposed work is developed

with Xilinx ISE 14.7 programming and the IP cores associated with

it. VHDL programming style is used to describe the hardware and

its functionality. Complex multiplier and N-bit parallel

adder/subtrator is designed using Xilinx IP core approach and

implemented using Virtex-5 XC5VXT50T device.

Keywords— FPGA, VHDL, Xilinx IP, Virtex-5, ISE

I. Introduction

Complex multipliers used wide applications in Digital signal

processing like Discrete Fourier transforms and Multiply and

accumulate (mac). The speed of the processor mainly

depends on the speed of the complex multipliers used. N- bit

parallel adder/subtractor, adds the 2 inputs A(Augend) and

B(Addend) and also subtract the 2 inputs A(Minuend) and

B(Subtrahend) as shown in figure 1.1.FA1 is the first full

adder. Similarly, FA2 is the second full adder and FAn is the

nth full adder. Ci1 is the initial carry of the first full adder .S1,

S2………, Sn-1, Sn, are the sum of the respective full

adders.CO1, CO2……… COn-1, COn,are the carry of the

respective full adders. Here P-line is the control signal i.e

depending on the input ‘p’ the circuit behaves as an adder and

the subtractor. If p control input is ‘o’, then the Ex-or gate for

the input B1 behaves as the adder and if the p-value is ‘1’ then

the circuit behaves as the subtractor.

If one input to the ex-or gate is ‘0’ then the output of the ex-

or gate will be input itself i.e 0⊕A=A; if input A is ‘0’ then

the output will be ‘0’ similarly if input A is ‘1’ then the output

will be ‘1’ which acts as an adder as discussed above

similarly If one input to the ex-or gate is ‘1’ then the output

of the ex-or gate will be complemented input i.e 1⊕A=A’; if

input A is ‘0’ then the output will be ‘1’ similarly if input A

is ‘1’ then the output will be ‘0’ which acts as the subtractor.

Let us take an example of the 4-bit adder/subtractor. One

input A for the 4-bit number is A3, A2, A1, A0, and another

input B for the 4-bit number is B3, B2, B1, B0,

S3, S2, S1, S0 are the sum of the full adders B ’, B ’

, B ’ B ’ are

the inverted inputs, and C04 is the carry obtained from A3− B3.

By using 2’s complement notation, the subtraction of 4-bit numbers is obtained

Fig 1.1 N-bit parallel adder/subtrator adder block diagram

VOLUME 1, ISSUE 1, AUG 2021


7

As discussed above, the implementation of an N-bit

parallel/adder can be obtained. To design a 64-bit parallel

adder/subtractor, the designer requires 64 full adders which

are connected in a cascaded manner. For example, the delay

of the first full adder is 8ns, then to generate the 64th Full adder,

it requires 8ns×64=512ns to generate the final output.

II. Literature Review

Many researchers used N-bit parallel adder/subtract for

designing athematic and logical circuits. Single precision

floating-point notation and double-precision floating point

notations are used for adding the unsigned numbers. Carry

select adder provides the optimal results. Device utilization

summary and timing analysis are carried out by the

researchers [1]. Binary Integer Decimal (BID) encoder

method is used for adding decimal floating-point

numbers[2].Virtex-5 FPGA hardware kit is used for

experimental purposes and a synthesis report is discussed. 64-

bit adder is designed using Ripple carry adder and carry look-

ahead adder with different specifications[3] Spartan-7 FPGA

kit is used.64 bit and 32-bit adder is designed by using

pipelining approach [4] area, power and timing analysis are

analyzed along with the back end design. N-bit FPGA based

parallel adder/subtractor designed by different researchers

[5][6][7].

III. PROPOSED WORK

In the era of digital design, engineers choose the hardware

description language for describing any complex logic

function. For example, if the digital design engineer needs the

multiplier or complex multiplier for designing the complex

multiplier circuit each time they were doing the project, It

would be reinventing the wheel and wasting their time.

Similarly, if the design engineer wants to re-code it

continually and reuse the same code, it will be very difficult,

and one has to waste more time and money. One solution for

the above problem is using the Xilinx

IP(Intellectual property)[8] cores. An IP is a piece of HDL

code where the design engineers have already written to

perform a specific task and hence saving the designer’s time.

Figure 3.1 shows the Xilinx Complex multiplier[9] 3.1 IP

core to open the IP core; the following steps should follow

open the Xilinx ISE or Xilinx Vivado suite then click file then

click New project under the project give the name of the

project and specify the device(Virtex-5) details then click

IP(Core generator & architecture wizard) the figure shown in

3.1 will appear, many IP cores are available in it use the basic

elements under the basic elements complex multiplier is used

in the design.

Fig 3.1 IP core generation wizard

Figure 3.3 is the complex multiplier IP core 3.0. The figure

3.4 IP core shown below has two halves. The left side has the

IP symbol, which is the prototype of the RTL but not exactly

the RTL. The exact RTL can be observed after writing the

HDL code and after instantiating the counter IP core. Ar[7:0]

is the 8 bits real part input AI[7:0] is the 8 bits imaginary part

input Br[7:0] is the 8 bits real part of another input BI[7:0] is

the 8 bits imaginary part another input.’clk’ is the clock input.

Pr [16:0] is the 16 bits real output, and PI[16:0] is the 16 bits

imaginary output. Five inputs and two outputs are enabled,

remaining all the pins are disabled. Round_Cy is the carry

input to facilitate the unbiased



8

rounding “CE” is the clock enable. ”SCLR” is the

synchronous clear. Round_Cy, CE, and SCLR are disabled,

and all the inputs and outputs which are disabled are optional.

Fig 3.2 64-bit parallel adder/subtraction

Fig 3.3 Xilinx Complex multiplier IP core

Fig 3.4 Complex Multiplier IP core

The right side of the IP core has AR/AI operands. This

operand has one input width with a minimum of 8-bits and a

maximum of 64 bits. The maximum value of the operand,

both real and imaginary can be 256 for selecting 8 bits, and

the maximum value can be 264 bits. Similarly, it has BR/BI

operands. Using multiplier construction option 4 DSP slices

are used under the optimization goal if performance is

selected.3 DSP slices are used if resource sharing approach is

used. The designer has the option of choosing to Look up the

table also. The designer can choose the product output by

default for the 8-bit complex number multiplication. The real

part product will be 16 bits, and the imaginary part will be 16,

but it can be truncated.

To design a 64 bit parallel adder/subtractor the IP core is used

to add, subtract, and add/subtract 256 bit signed and unsigned

numbers. The parallel adder, parallel subtractor, and parallel

adder cum subtractor.

Figure 3.2 is the 64-bit unsigned parallel adder/subtractor.

The IP core shown above has two halves. The left side has the

IP symbol a prototype of the RTL but not exactly the RTL. The

exact RTL can be observed after writing the HDL code and

after instantiating the adder IP core. A[63:0] is the 64-bit

input augend/minuend , B[63:0] is another

input(addend/Subtrahend),’clk’ is the clock input. The 3

inputs shown are the enabled and remaining ports are

disabled the inputs are “ADD” which is used when we select

add/subtract module. “C_in” is the carry input.”CE” is the

clock enable.”SCLR” is the synchronous clear. “SSET” is the

Synchronous set.”SSIT” is the synchronous Init. The outputs

are S[64:0], sum/difference output of 65 bits enabled and

C_Out carry out disabled. All the inputs and outputs which

are disabled are optional. For any N bit parallel

adder/subtractor the add is the one bit input. If add is ‘1’ then

the addition of 64 bits are performed otherwise subtraction is

carried out. The right side has the component selection. Under

the component selection, the implementation type has 2

options, one Fabric and another DSP, through which Fabric

uses 65 LUTs and 65 Flip flops. DSP48 is not implemented

for a 64-bit adder/subtractor. The maximum subtraction is

allowed 47 bits with the use of 1 DSP48 processor inside the

inbuilt FPGA hardware.

The RTL schematic of the 8-bit complex multiplier IP core

block diagram is shown in figure 3.4. Xilinx generates .ngr

file for the RTL schematic.”clk” is the clock input, ar (7:0) is

the 8 bit input1 real part of the signal and ai(7:0) is the 8 bit

input1 imaginary part of the signal.br(7:0) is the 8 bit input

2 real part of the signal and bi(7:0) is the 8 bit input2

imaginary part of the signal.pr(16:0) is the real part of the

output signal, and Pi(16:0) is the imaginary part of the output

signal. Figure 3.4 is the prototype of the RTL, and figure 3.5

is the exact RTL



9

Fig 3.5 RTL view of 8 bit complex multiplier IP core

Fig. 3.6 RTL schematic of 64-bit parallel adder/subtractor IP

core

The RTL schematic of a 64-bit parallel adder/subtractor IP

core block diagram is shown in figure 3.6. Xilinx generates

.ngr file for the RTL schematic.”clk” is the clock input, a

(63:0) is one of the input augend/minuend to the

adder/subtractor and b(63:0) is another input to the

adder/subtractor (addend/subtrahend).add is the one bit input

if add is high then the above RTL acts as an adder and if add

is logic level zero then the above RTL acts as a subtract

s(64:0) is the output sum/difference. Figure 3.2 is the

prototype of the RTL and figure 3.6 is the exact RTL. From

the above discussion, it can be concluded that add is the

control input line, like ‘p’ in fig.1.1 if add is ‘1’ then the

addition of 64 bits are carried out; otherwise, subtraction is

carried out.

Fig. 3.7 Internal schematic of 64-bit parallel adder/subtractor

IP core

The internal schematic of 64 bit parallel adder/subtractor IP

core is shown in fig.3.7 by using the instantiation of the core

gen x(63:0) is mapped to a(63:0) and y(63:0) is mapped to

b(63:0).clk pin mapped with the same clk name signal. ’p’ is

mapped with add. s(64:0) output blocks are mapped with

sp(64:0)

The example of the complex number is illustrated

below with example

(ar+ai)(br+bi)

arbr+arbi+aibr+aibi

arbr+abi2+i(ar+br)

arbr-ab+i(ar+br) (Since i2= -1)

Fig. 3.8 8-bit Complex multiplier simulation using IP Core



10

Fig 3.9 Complex multiplier example

Figure 3.8 shows the ISIM simulation waveform of the 8- bit complex multiplier using IP Core. By default, Xilinx has the isim simulator, but the third party simulator like ModelSim the product of Mentor graphics [10], will also give the same simulation waveform

is 4 bi[7:0] is the 8-bit input imaginary value and its decimal

value is 5. and output is b[4:0] is the 5 bit output width its

binary value is “01111” and its decimal value is 15. From the

above figure, it can be observed that the square root of 255 is

15.The maximum integer square root for the 8 bit input data

is 255.

Figure 3.10 shows the ISIM simulation waveform of the 64-

bit parallel adder. By default, Xilinx has the isim simulator

but the third party simulator like modelsim[11] the product of

Mentor graphics will also give the same simulation waveform

.One input for the above fig 3.10 in the simulation is a[63:0]

the Hexadecimal its value “0000000000002000” its decimal

value is 8192. Similarly, the input b[63:0] and its

Hexadecimal value is “0000000000001000” its decimal

value is 4096. The output s[64:0] is the Hexadecimal

“000000000000003000” its decimal value is 12,288.clk is the

clock pulse of the rising edge of the input signal. If input add

is high logic level i.e. ‘1’ then the addition of two 64 bit

numbers are performed.

Figure 3.11 shows the ISIM simulation waveform of the 64-

bit parallel subtractor. One input for the below simulation is

a[63:0] the Hexadecimal its value “0000000000002400” its

decimal value is 9216. Similarly, the input b[63:0] and its

Hexadecimal value is “0000000000001200” its decimal

value is 4608. The output s[64:0] is the Hexadecimal

“000000000000001200” its decimal value is 4608.clk is the

clock pulse of the rising edge of the input signal. If input add

is low logic level i.e ‘0’ then subtraction of two 64 bit

numbers are performed. Figure 3.4 and 3.5 simulation

waveforms are with respect to unsigned decimals.

One input for the above fig 3.8 in the simulation is “clk” i.e

“clk” is the clock pulse of the rising edge of the input signal

ar[7:0] is the 8-bit input real value, and its decimal value is 3

ai[7:0] is the 8-bit input imaginary value and its decimal value

is 2. br[7:0] is the 8 bit input real value and its decimal value

Fig. 3.10 64 bit parallel adder simulation using IP Core

Fig. 3.11 64 bit parallel subtractor simulation using IP-

Core

Table 3.1 64 bit parallel adder/ subtractor IP core

synthesis report

The table 3.1 shows the synthesis report of the 64-bit parallel

adder/subtractor IP core. The number of slice registers, Slice

Look up tables, Fully used LUT-FFs pairs, Input output

blocks, and Buffer memories are 65,66,65,194,1.The



11

maximum combinational path delay for the parallel

adder/subtractor circuit is 2.826ns through which 2.540ns for

the logic, 0.286ns for the routing Total REAL time to

Xst(Xilinx synthesis Technology) completion is 21.00 secs,

and total CPU time to Xst(Xilinx Synthesis Technology)

completion is 21.02secs. Total memory usage is

4505144kilobytes.

Fig. 3.12 64 bit parallel adder/subtractor power report

The power report of the 64-bit parallel adder/subtractor is

shown in figure 3.12 The total power[11] occupied by design

is 0.560 W. The Xilinx generates .pcf file to create the power

report. To estimate the power dissipation Xilinx X power

analyzer software is required and the input to the xpower

analyzer is the .ncd file which means native circuit design. By

adding the .ncd file to the xpower analyzer it gives the power

report of the design. The power analysis software has the

synthesis report and as well as total on-chip power of the

design. Comparing the fig.3.6 and table 3.1 it can be observed

that there are 65 slice registers occupied by the design and the

same results are obtained in the power report. Out of 480 Input

output blocks available, only 194 I/O blocks are used in the

design. The clock input signal is the global clock pulse for the

design has the one clock input in Fig.3.6 and also in the Table

3.1.

Acknowledgement

The authors would like to thank the VLSI Design Lab,

Department of Electronics and Communication Engineering,

Nalla Malla Reddy Engineering College, Hyderabad, for

providing cooperation and laboratory facility to carry out our

research work.

References

[1] Parte, Ragini, and Jitendra Jain. "Analysis of Effects of using Exponent

Adders in IEEE-754 Multiplier by VHDL." 2015 International Conference

on Circuits, Power and Computing Technologies [ICCPCT-2015]. IEEE, 2015.

[2] Farmahini-Farahani, Amin, Charles Tsen, and Katherine Compton.

"FPGA Implementation of a 64-Bit BID-based decimal floating-point adder/subtractor." 2009 International Conference on Field-Programmable

Technology. IEEE, 2009.

[3] Reddy, Konda Sai Prakash. "Designing Various 64-bit Adders using VHDL in VIVADO." Journal of Innovation in Electronics and

Communication Engineering 9.2 (2019): 6-11. [4] Aarthy, M. "ASIC Implementation of 32 and 64 bit Floating Point ALU

using Pipelining." International Journal of Computer Applications 94.17

(2014). [5] Jaiswal, Manish Kumar, and Ray CC Cheung. "High performance FPGA

implementation of double precision floating point adder/subtractor."

International Journal of Hybrid Information Technology 4.4 (2011): 71-80.

[6] Vitoroulis, Konstantinos, and Asim J. Al-Khalili. "Performance of

parallel prefix adders implemented with FPGA technology." 2007 IEEE

Northeast Workshop on Circuits and Systems. IEEE, 2007. [7] Khan, Mozammel HA, and Marek A. Perkowski. "Quantum ternary

parallel adder/subtractor with partially-look-ahead carry." Journal of

Systems Architecture 53.7 (2007): 453-464. [8] Ghosh, Santosh, Ingrid Verbauwhede, and Dipanwita Roychowdhury.

"Core based architecture to speed up optimal ate pairing on FPGA platform."

International Conference on Pairing-Based Cryptography. Springer, Berlin, Heidelberg, 2012.

[9] Rao, K. Deergha, Ch Gangadhar, and Praveen K. Korrai. "FPGA implementation of complex multiplier using minimum delay Vedic real

multiplier architecture." 2016 IEEE Uttar Pradesh Section International

Conference on Electrical, Computer and Electronics Engineering (UPCON). IEEE, 2016. [10] Rangaraju, H. G., et al. "Low power reversible parallel binary

adder/subtractor." arXiv preprint arXiv:1009.6218 (2010). [11] Akhter, Shamim, Gaurav Raturi, and Shaheen Khan. "Analysis and

design of residue number system based building blocks." 2018 5th

International Conference on Signal Processing and Integrated Networks

(SPIN). IEEE, 2018.

IV Conclusion

The VHDL implementation of 64-bit adder/subtractor

using IP core generation and a complex multiplier is

performed. Its device utilization summary, RTL view,

simulation results, power report has been tested with

FPGA Virtex-5 XC5VXT50T [13] device, which works

on 65nm technology. Xilinx Isim is used for simulation

analysis. Xilinx 14.7 is used for synthesis, Place and Route.

The total power occupied by design is 0.56 W.


`hardware implementation of parallel adder/subtarctor and

Documents