key arithmetic units -...

[email protected]

http://www.cs.nctu.edu.tw/~ldvan/

Key Arithmetic Units

Lan-Da Van (范倫達), Ph. D.

Department of Computer Science National Chiao Tung University

Taiwan, R.O.C. Spring, 2011

Source: Prof. M. B. Lin. Digital System Designs

and Practices, 2008, Wiley. Adopt Chapter 1

slides from this book.

Digital Systems Design

Lecture 1 Lecture 1

Outlines

Describe both addition and subtraction modules

Understand the principles of carry-look-ahead

(CLA) adder

Describe the operations of multiplication

Describe the operations of division

Describe the designs of arithmetic-logic unit (ALU)

2


Lecture 1 Lecture 1

Addition and Subtraction

The bottleneck of a conventional n-bit ripple-carry

adder is on the generation of carry needed in each

stage.

To overcome this, many schemes have been

proposed, including

— carry-look-ahead (CLA) adder

— parallel-prefix adders:

Kogge-Stone adder

Brent-Kung adder

3


Lecture 1 Lecture 1

A CLA adder

Define two new functions:

— carry generate (gi): gi = xi · yi

— carry propagate (pi): pi = xi ⊕ yi

xi

si

ci+1

ci

yi

pi

gi

iii cps

iiii cpgc 1

0001 cpgc

001011

000111112 )(

cppgpg

cpgpgcpgc

4


A Carry-Lookahead Generator

p0

g0

c0

c4

c3

c2

c1

p1

g1

p2

g2

p3

g3

5


A CLA Adder

x0y

0

p0

g0

s0

s1

s2

s3

p1 g

1

x1y

1

p2

g2

x2

y2

x3y

3

p3 g

3

Sum generator

pg generator

c3 c

2c

1c

0

c0

c4

CLA generator

6


A CLA Adder

// a 4-bit CLA adder using assign statements

module cla_adder_4bits(x, y, cin, sum, cout);

// inputs and outputs

input [3:0] x, y;

input cin;

output [3:0] sum;

output cout;

// internal wires

wire p0,g0, p1,g1, p2,g2, p3,g3;

wire c4, c3, c2, c1;

// compute the p for each stage

assign p0 = x[0] ^ y[0], p1 = x[1] ^ y[1],

p2 = x[2] ^ y[2], p3 = x[3] ^ y[3];

7


A CLA Adder

// compute the g for each stage

assign g0 = x[0] & y[0], g1 = x[1] & y[1],

g2 = x[2] & y[2], g3 = x[3] & y[3];

// compute the carry for each stage

assign c1 = g0 | (p0 & cin),

c2 = g1 | (p1 & g0) | (p1 & p0 & cin),

c3 = g2 | (p2 & g1) | (p2 & p1 & g0) | (p2 & p1 & p0 & cin),

c4 = g3 | (p3 & g2) | (p3 & p2 & g1) | (p3 & p2 & p1 & g0) |

(p3 & p2 & p1 & p0 & cin);

// compute Sum

assign sum[0] = p0 ^ cin, sum[1] = p1 ^ c1,

sum[2] = p2 ^ c2, sum[3] = p3 ^ c3;

// assign carry output

assign cout = c4;

endmodule

8


A CLA Adder -- Using Generate Statements

// an n-bit CLA adder using generate loops

module cla_adder_generate(x, y, cin, sum, cout);

// inputs and outputs

parameter N = 4; //define the default size

input [N-1:0] x, y;

input cin;

output [N-1:0] sum;

output cout;

// internal wires

wire [N-1:0] p, g;

wire [N:0] c;

// assign input carry

assign c[0] = cin;

n 4 8 16 32

f (MHz) 104.3 78.9 53.0 32.0

LUTs 8 16 32 64

Virtex 2 XC2V250 FG456 -6

9


14-10

A CLA Adder -- Using Generate Statements

genvar i;

generate for (i = 0; i <N; i = i + 1) begin: pq_cla

assign p[i] = x[i] ^ y[i];

assign g[i] = x[i] & y[i];

end endgenerate // compute generate and propagation

generate for (i = 1; i < N+1; i = i + 1) begin: carry_cla

assign c[i] = g[i-1] | (p[i-1] & c[i-1]);

end endgenerate // compute carry for each stage

generate for (i = 0; i < N; i = i + 1) begin: sum_cla

assign sum[i] = p[i] ^ c[i];

end endgenerate // compute sum

assign cout = c[n]; // assign final carry

endmodule // end of cla_adder_generate module


14-11

A CLA Adder --- Using Generate Statements

p[0]

p[1] p[2]

p[3]

g[0] g[1]

g[2] g[3]

carry_cla\[1\].un2_c

sum_1[0]

c[1] carry_cla\[2\].un10_c

sum_1[1]


sum_1[2]


sum_1[3]

c[4] [0]

[0][0]

[1][1]

[1]

[2][2]

[2]

[3][3]

[3]

[0][0]

[0]

[1][1]

[1]

[2][2]

[2]

[3][3]

[3]

[0]

[0][0]

[0][1]

[1]

[1]

[1][1]

[1]

[1][2]

[2]

[2]

[2][2]

[2]

[2][3]

[3]

[3]

[3][3]

[3]

[3][4]

cin

y[3:0][3:0]

x[3:0][3:0]

cout[4]

sum[3:0][3:0]


Lecture 1 Lecture 1

Shift-and-Add Multiplication

Rules for a multiple-bit multiplicand times a 1-bit

multiplier:

1. The partial product is the same as the multiplicand if

the multiplier is 1; otherwise,

2. The partial product is 0.

m-bit adder

M0

m-bit 2-to-1

MUX

A Q

Q[0]

Multiplicand

Multiplier/

partial product

m

m

m +1

m

0 1

m

m

M

Q[0]

m

12


Lecture 1 Lecture 1

Shift-and-Add Multiplication

Algorithm: Shift-and-added multiplication

Input: An m-bit multiplicand and an n-bit multiplier.

Output: The (m+ n)-bit product.

Begin

1. Load multiplicand and multiplier into registers M and Q,

respectively;

clear register A and set loop count CNT equal to n.

2. repeat

2.1 if (Q[0] == 1) then A = A +M;

2.2 Right shift register pair A : Q one bit;

2.3 CNT = CNT- 1;

until (CNT == 0);

End

13


Lecture 1 Lecture 1

A Basic Array Multiplier

A multiple-cycle structure can also be implemented

by using an iterative logic structure.

x3

x2

x1

x0

y3

y2

y1

y0

= X

= Y

(multiplier)

(multiplicand)

x0y

0x

1y

0x

2y

0x

3y

0

x0y

1x

1y

1x

2y

1x

3y

1

x0y

2x

1y

2x

2y

2x

3y

2

x0y

3x

1y

3x

2y

3x

3y

3

P6

P5

P4

P3

P2

P1

P0

Partial product

Product

+

1

0

1

0

1

0

1

0

1

0

2222

nm

k

kk

n

j

jiji

m

i

n

j

jj

m

i

ii PyxyxYXP

14


A Basic Unsigned Array Multiplier

y0

P0

P1

y1

x0

x1x

2x

3

y2

y3

x0x

1x

2x

3

P2

P3

P4

P5

P6

P7

xy

SCout

Cin

FA

x0 0x

1x

2x

3 000

0

0

x0

x1x

2x

3

0

0

X = 0111 (7)

Y = 1011 (11)1 1 1 1

0 1 1 1

1

0

1

0

1

0

0

0

1 1 1 1

0 1 1 1

0101

1110

0 0 0 0

1 1 1 1

0 1 1 1

0 1 1 1

0000

1010

1001

0110

0

These two rows may be

combined into one row.

Critical path

m

n

15


An Unsigned CSA Array Multiplier

y0

x0 0x

1x

2x

3

y1

y2

y3

x0x

1x

2x

3

x0x

1x

2x

3

x0x

1x

2x

3

0

0

0

0

P0

P1

P2

P3

P4

P5

P6

P7

Ripple-carry adder or

carry-look-ahead adder

xy

SCout

Cin

FA

000

0

0000

Critical path

These two rows may be

combined into one row.

m

n

16


Lecture 1 Lecture 1

Signed Array Multiplication

Let X and Y be two two’s complement number

im

ii

mm xxX 22

2

0

11

jn

jj

nn yyY 22

2

0

11

2

0

2

0

11

11

211

2

0

2

0

2

0

11

2

0

11

2222

2222

m

i

n

j

mjjm

nini

nmnm

jij

m

i

n

ji

n

j

jj

nn

m

i

ii

mm

yxyxyxyx

yyxx

XYP

17


Lecture 1 Lecture 1


x3

x2

x1

x0

y3

y2

y1

y0

= X

= Y (multiplier)

(multiplicand)

x0y

0x

1y

0x

2y

0y

3x

0

x0y

1x

1y

1x

2y

1y

3x

1

x0y

2x

1y

2x

2y

2y

3x

2

y0x

3y

1x

3y

2x

3y

3x

3

P6

P5

P4

P3

P2

P1

P0

Partial product

Product

+

1

P7

1

18



y0

x0 0x

1x

2x

3

y1

y2

y3

x0x

1x

2x

3

x0x

1x

2x

3

x0x

1x

2x

3

0

0

1

0

P0

P1

P2

P3

P4

P5

P6

P7

Ripple-carry adder or

carry-look-ahead adder

xy

SCout

Cin

FA

000

1

0000

19


Lecture 1

Baugh-Wooley Multiplier

A

AHA

AFA

AFA

AFA

AFA

AFA

NFA

FA

A

AHA

AFA

AFA

AFA

AFA

AFA

NFA

FA

A

AHA

AFA

AFA

AFA

AFA

AFA

NFA

FA

A

AHA

AFA

AFA

AFA

AFA

AFA

NFA

FA

A

AHA

AFA

AFA

AFA

AFA

AFA

NFA

FA

A

AHA

AFA

AFA

AFA

AFA

AFA

NFA

FA

A

AHA

AFA

AFA

AFA

AFA

AFA

NFA

FA

ND

ND

ND

ND

ND

ND

ND

A

inverter 1

7x 6x 5x4x 3x

1x2x 0x

0y

1y

2y

3y

4y

5y

6y

7y

0P

1P

2P

3P

4P

5P

6P

7P

8P9P10P11P12P13P14P15P


21

NFA Gate on Left Side and

AFA Gate on Right Side

FA

inS inC

outSoutC

ix

jy FA

inS inC

outSoutC

ix

jy


Lecture 1 Lecture 1

Lan-Da

Van

22

Booth Encoding

Instead of 3Y, try –Y, then increment next partial

product to add 4Y

Similarly, for 2Y, try –2Y + 4Y in next partial

product


23

Booth Multiplication

Where B can be written as

---(1)

---(2)

---(3)

---(4)

12

0

2

n

i

iiPABP

2/)2(

0i

212212 2 )2(

ni

iii bbbB

2/)2(

0

2/)2(

0i

212212 2 )2(

n

i

i

n

iiii SAbbbABP

Where Si can be denoted as iiiii AbbbS 2

12212 2)2(

ii

nini

ninii SSSS 2

0,22

2,12

1, 2...22


24

Sign-Generate Sign Extension Scheme

69

8j

7,34

11

8j

7,22

13

8j

7,10

15

8j

7,0 2 )2(2 )2(2 )2(2 )2(

jjjj SSSSS

)22()22()22( 127,2

13107,1

1187,0

9 SSS

8147,3

15 2)22( S

where S is the final sign result. Eqs. (3) and (5) can be

mapped into the partial product diagram and modified

Booth multiplier structure as shown in the following slide

---(5)


Lan-

Da

Van

VLSI

-09-

25

Modified Booth Partial-Product Diagram for an 8x8 Multiplier

0,7 0,7 0,6 0,5 0,4 0,3 0,2 0,1 0,01 S S S S S S S S S

1

1,7 1,7 1,6 1,5 1,4 1,3 1,2 1,1 1,01 S S S S S S S S S

2,7 2,7 2,6 2,5 2,4 2,3 2,2 2,1 2,01 S S S S S S S S S

3,7 3,7 3,6 3,5 3,4 3,3 3,2 3,1 3,01 S S S S S S S S S

Ctrl0 2

Ctrl1 2

Ctrl2 2

Ctrl3 2

w=0

w=1

w=n-

1

n columns

1n columns

2 1n columns

LP MP


26

An 8x8 Modified Booth Multiplier

a7

0

0

0

0

P0P1P2P3P4P5P6P7P8P9P10P11P12P13P14P15

a6 a5 a4 a3 a2 a1 a0

a7 a6 a5 a4 a3 a2 a1 a0

a7 a6 a5 a4 a3 a2 a1 a0

a7 a6 a5 a4 a3 a2 a1 a0

Ctrl0[2]Ctrl1[2]

Ctrl3[2]

Ctrl2[2]

1

11

1

1

Booth

encoder

B[1:0],0

Booth

encoder

B[3:1]

Booth

encoder

B[5:3]

Booth

encoder

B[7:5]

sel selselselselselselselsel

sel sel sel sel sel sel sel selsel

selselselselselselselselsel

selselselselselselselselsel

Ctrl0[2:0]

Ctrl1[2:0]

Ctrl2[2:0]

Ctrl3[2:0]

HA

HA FA HA HA HA HA HA HA

HAHAFAFAFAFAFAFAFAFAFAFAFAFAFA

HA FA FA FA FA FA FA FA

HA FA FA FA FA FA FA FA


Lecture 1 Lecture 1

An Unsigned Nonrestoring Division Algorithm

Algorithm: Unsigned nonrestoring division Input: An n-bit dividend and an m-bit divisor. Output: The quotient and remainder. Begin 1. Load divisor and dividend into registers M and D, respectively; clear partial-remainder register R and set loop count CNT equal to n - 1. 2. Left shift register pair R : D one bit. 3. Compute R = R - M; 4. repeat 4.1 if (R < 0) begin D[0] = 0; left shift R:D one bit; R = R +M; end else begin D[0] = 1; left shift R:D one bit; R = R - M;

end 4.2 CNT = CNT − 1; until (CNT == 0) 5. if (R < 0) begin D[0] = 0; R = R+M ; end else D[0] = 1; End

27


An Unsigned Nonrestoring Division Example

0 0 0 0 1 0 1 0 1

dividend (D)divisor(M)

0 1 1 01 0 1 0

1 0 1 085

10=01010101

610

=0110

2's complement of 6

= 1010

10 1 1 0

1 0 1 1

0

0

0 1 1 0

1 1 0

0

0 1

0

1 1 1

0 1

0

0 1 1 0

1 0 0 10

1 0 1 0

0 0 1 1

1

1

D - M< 0，Q = 0

right shift M，D + M

> 0，Q = 1

Remainder

1 0

1 0

> 0，Q = 1

0 1or represents quotient bit

< 0，Q = 0


< 0，Q = 0


< 0，Q = 0right shift M，D + M

0 1

0

1 0 1 0

0 0 0 0

right shift M，D - M

right shift M，D - M

1 > 0，Q = 1

right shift M，D - M1 0 1 0

1 0 1 1

1

0 < 0，Q = 0

D + M0 1 1 0

0 0 0 1

Hence quotient = 00001110

remainder = 0001

28


Lecture 1 Lecture 1

A Sequential Unsigned Nonrestoring Division

A sequential implementation of unsigned

nonrestoring division.

n-bit adder

M

R D D[0]

Divisor

Dividend/quotient

m

m

m

m

True/complement generator Sub/add

Remainder

Cout

29


An Unsigned Array Nonrestoring Divider

M3

M2 M

1M

0

Q3

Q2

Q1

Q0

CAS CAS CAS CAS

CAS CAS CAS CAS

CAS CAS CAS CAS

CAS CAS CAS CAS

FAFAFAFA 0

R3

R2

R1

R0

D0

D1

D2

D3

FA

CAS

0

1

1 0 10 0 0 1

0 1 1 11 1 000 1 0 1

0 1 0 1

0 1 0 1

0

0

1

0

0 0 1 01 1 01

0 0 101 0 0 0

0 1 0 11 1 100 1 10

0 0 01

CAS: controlled adder and subtractor

1

0

0

0 1 10

1 1 10

0 1 0 1 0 0 0 1 1 0 00 0 1 0

0 1 0 11 1 0 0 1

0 1 0 11 1 1 0 0

0 1 0 1

0 0 0 1 00 1 0 1

1 1 0 10 1 0 1

0 0 1 0

Remainder

correction

30


Lecture 1 Lecture 1

Arithmetic-Logic Units

An arithmetic and logic unit (ALU) is often the major component for the datapath of many applications, especially for central processing unit (CPU) design. An ALU contains two portions:

Arithmetic unit: — addition

— subtraction

— multiplication

— division

Logical unit: — AND

— OR

— NOT

31


Lecture 1 Lecture 1

Shift Operations

Types of shift operations:

— Logical shift

The vacancy bits are filled with 0s.

— Arithmetic shift

The vacancy bits are filled with 0s or the sign bit.

32


Lecture 1 Lecture 1

Logical Shift Operations

Logical shift operations:

— Logical left shift

The input is shifted left a specified number of bit positions.

All vacancy bits are filled with 0s.

— Logical right shift

The input is shifted right a specified number of bit positions.


33


Lecture 1 Lecture 1

Arithmetic Shift Operations

Arithmetic shift operations:

— Arithmetic left shift

The input is shifted left a specified number of bit positions.


Indeed, this is exactly the same as logical left shift.

— Arithmetic right shift

The input is shifted right a specified number of bit positions.

All vacancy bits are filled with the sign bit.

34


Arithmetic-Logic Units

ALU

Shifter

A B

F

N ZV C

Cn

Cn-1

Fn

Overflow

Sum

Shifter_mode

Shift_amount

Cin

ALU_mode

x y

Flags

35


Lecture 1 Lecture 1

Summary

You have learn the following items:

— Addition and subtraction modules

— Carry-look-ahead (CLA) adder

— Multipliers: Shift-and-Add multiplier, Baugh-Wooley

multiplier, Booth multipliers

— Divider

— Arithmetic-logic unit (ALU)

36

key arithmetic units -...

Documents