ch024

7/30/2019 ch024

1/97

7/30/2019 ch024

2/97

Why DSPWhy DSP

a special class of microprocessors that area special class of microprocessors that are

optimized for computing the real-timeoptimized for computing the real-time

calculations used in signal processingcalculations used in signal processing

DSPs have an architecture that simplifiesDSPs have an architecture that simplifies

application designs and makes low-costapplication designs and makes low-cost

signal processing a realitysignal processing a reality

7/30/2019 ch024

3/97

characteristicscharacteristics

fast, flexible computation unitsfast, flexible computation units

unconstrained data flow to and from theunconstrained data flow to and from the

computation unitscomputation units extended precision and dynamic range inextended precision and dynamic range in

the computation unitsthe computation units

dual address generatorsdual address generators efficient program sequencing and loopingefficient program sequencing and looping

mechanismsmechanisms

7/30/2019 ch024

4/97

SHARC family of DSPsSHARC family of DSPs

Harvard architectureHarvard architecture

one instructions per lineone instructions per line

each instruction, end with with aeach instruction, end with with asemicolon (;)semicolon (;)

A label, end with a colon (:)A label, end with a colon (:)

Comments, start with an exclamationComments, start with an exclamationpoint (!)point (!)

7/30/2019 ch024

5/97

Instructions exampleInstructions example

R1 = DM(M0,I0), R2 = PM(M8,I8); ! aR1 = DM(M0,I0), R2 = PM(M8,I8); ! a

commentcomment

Label:Label:

R3 = R1 + R2;R3 = R1 + R2;

7/30/2019 ch024

6/97

7/30/2019 ch024

7/97

memorymemory

SHARC uses different word sizes andSHARC uses different word sizes and

address space sizes for instructions andaddress space sizes for instructions and

datadata

instruction consists of 48 bitsinstruction consists of 48 bits

basic data word, 32 bitsbasic data word, 32 bits

address, 32 bitsaddress, 32 bits

7/30/2019 ch024

8/97

on-chip memoryon-chip memory

the 21061, has smallest 1Mbit of on-chipthe 21061, has smallest 1Mbit of on-chip

memorymemory

internal memory:internal memory:

(PM),(PM),

(DM)(DM)

7/30/2019 ch024

9/97

types of datatypes of data

32-bit IEEE single-precision floating-point32-bit IEEE single-precision floating-point

40-bit IEEE extended-precision floating-40-bit IEEE extended-precision floating-

pointpoint

32-bit integers32-bit integers

7/30/2019 ch024

10/97

SHARC memorySHARC memory

allows the program memory to hold bothallows the program memory to hold both

data and instructionsdata and instructions

allow extra data to be squeezed into theallow extra data to be squeezed into the

on-chip memoryon-chip memory

allows data to be fetched from bothallows data to be fetched from both

memories in parallelmemories in parallel

7/30/2019 ch024

11/97


The PM bus is used to access eitherThe PM bus is used to access either

instructions or datainstructions or data

During a single cycle the processor canDuring a single cycle the processor can

access two data operands, one over theaccess two data operands, one over the

PM bus and one over the DM busPM bus and one over the DM bus

7/30/2019 ch024

12/97

7/30/2019 ch024

13/97


Each DAG keeps track of up to eightEach DAG keeps track of up to eight

address pointers, eight modifiers and eightaddress pointers, eight modifiers and eight

length valueslength values

A pointer used for indirect addressing canA pointer used for indirect addressing can

be modified by a value in a specifiedbe modified by a value in a specified

registerregister

7/30/2019 ch024

14/97

7/30/2019 ch024

15/97

SHARC programming modelSHARC programming model

The primary data registers, r0-r15 or f0-The primary data registers, r0-r15 or f0-

f15f15

R0-R15: used for integer operationsR0-R15: used for integer operations

F0-F15: used for floating-point operationsF0-F15: used for floating-point operations

registers are 40 bits long for data typeregisters are 40 bits long for data type

- 40-bit extended-precision floating-point- 40-bit extended-precision floating-pointvaluevalue

- 32-bit data types, in most-significant bits- 32-bit data types, in most-significant bits

7/30/2019 ch024

16/97

7/30/2019 ch024

17/97

CPUCPU

CPU has three major data function units:CPU has three major data function units:

an ALU, a multiplier, and a shifter.an ALU, a multiplier, and a shifter.

three most-significant mode registers forthree most-significant mode registers for

data operations:data operations:

- arithmetic status- arithmetic status(ASTAT),(ASTAT),

- stickysticky(STKY),(STKY),- mode 1- mode 1 (MODE1)(MODE1)

7/30/2019 ch024

18/97

The ALU updates seven status flags in theThe ALU updates seven status flags in the

ASTAT register at the end of eachASTAT register at the end of each

operationoperation

ALU also updates fourALU also updates four stickysticky status flagsstatus flags

in the STKY register.in the STKY register.

Once set, a sticky flag remains high untilOnce set, a sticky flag remains high until

explicitly clearedexplicitly cleared

7/30/2019 ch024

19/97

ASTATASTATBitBit NameName DefinitionDefinition

00 AZAZ ALU result zero or floating-point underflowALU result zero or floating-point underflow

11 AVAV ALU overflowALU overflow

22

ANAN

ALU result negativeALU result negative

33 ACAC ALU fixed-point carryALU fixed-point carry

44 ASAS ALU X input sign (ABS, MANT operations)ALU X input sign (ABS, MANT operations)

55 AIAI ALU floating-point invalid operationALU floating-point invalid operation1010 AFAF Last ALU operation was a floating-point operationLast ALU operation was a floating-point operation

31-31-

2424CACCCACC Compare Accumulation register (results of last 8Compare Accumulation register (results of last 8

compare operations)compare operations)

7/30/2019 ch024

20/97

STKYSTKY

BitBit NameName DefinitionDefinition

00 AUSAUS ALU floating-point underflowALU floating-point underflow

11 AVSAVS ALU floating-point overflowALU floating-point overflow

22 AOSAOS ALU fixed-point overflowALU fixed-point overflow

55 AISAIS ALU floating-point invalid operationALU floating-point invalid operation

7/30/2019 ch024

21/97

Rn, Rx,Rn, Rx, andand RyRyare arbitrary data registersare arbitrary data registers

R0-R15R0-R15

operations set various status bits in theoperations set various status bits in the

ASTAT1 and STKY registersASTAT1 and STKY registers

COMPCOMPcompares two values withoutcompares two values without

modifying any data registersmodifying any data registers

7/30/2019 ch024

22/97

Rn = Rx+RyRn = Rx+Ry

Rn = Rx-RyRn = Rx-Ry

Rn = Rx+Ry+CIRn = Rx+Ry+CI

Rn = Rx-Ry+CI-lRn = Rx-Ry+CI-l

Rn=(Rx + Ry)/2Rn=(Rx + Ry)/2

COMP(Rx,Ry)COMP(Rx,Ry)

AddAdd

SubtractSubtract

Add with carryAdd with carry

Subtract with borrowSubtract with borrow

AverageAverage

CompareCompare

7/30/2019 ch024

23/97

Rn = Rx + CIRn = Rx + CI

Rn = Rx+CI-lRn = Rx+CI-l

Rn = Rx+lRn = Rx+l

Rn = Rx-lRn = Rx-l

Rn = -RxRn = -Rx

Rn = ABS RxRn = ABS Rx

Rn = PASS RxRn = PASS Rx

Add carryAdd carry

Add borrowAdd borrow

IncrementIncrement

DecrementDecrement

NegateNegate

Absolute valueAbsolute value

Copy Rx to RnCopy Rx to Rn

7/30/2019 ch024

24/97

Rn = Rx AND RyRn = Rx AND Ry

Rn = Rx OR RyRn = Rx OR Ry

Rn = Rx XOR RyRn = Rx XOR Ry

Rn = NOT RxRn = NOT Rx

Rn = MIN(Rx,Ry)Rn = MIN(Rx,Ry)

Rn = MAX(Rx,Ry)Rn = MAX(Rx,Ry)

Rn = CLIP Rx by RyRn = CLIP Rx by Ry

Logical ANDLogical AND

Logical ORLogical OR

Logical exclusive ORLogical exclusive OR

Logical negateLogical negate

Minimum of Rx, RyMinimum of Rx, Ry

Maximum of Rx, RyMaximum of Rx, Ry

Clip Rx within range [-Ry,Ry]Clip Rx within range [-Ry,Ry]

7/30/2019 ch024

25/97

All the ALU operations set the AZ (ALU resultAll the ALU operations set the AZ (ALU resultzero), AN (ALU result negazero), AN (ALU result nega--tive), AV (ALU resulttive), AV (ALU resultoverflow), AC (ALU fixed-point carry), and AIoverflow), AC (ALU fixed-point carry), and AI(floating(floating--point invalid) bits in the ASTAT register.point invalid) bits in the ASTAT register.

STKY register is a sticky version of ASTATSTKY register is a sticky version of ASTATregister.register.

STKY bits are set along with the ASTAT registerSTKY bits are set along with the ASTAT register

bits, but are not cleared.bits, but are not cleared. STKY bits always remain set until cleared by anSTKY bits always remain set until cleared by an

instruction.instruction.

7/30/2019 ch024

26/97

Saturation ModeSaturation Mode

The SHARC can performThe SHARC can perform

arithmetic on fixed-point values.arithmetic on fixed-point values.

all positive fixed-point overflows cause theall positive fixed-point overflows cause the

maximum positive fixed-point numbermaximum positive fixed-point number

(0x7FFF FFFF) to be returned, and all(0x7FFF FFFF) to be returned, and all

negative overflows cause the maximumnegative overflows cause the maximum

negative number (0x8000 0000) to benegative number (0x8000 0000) to bereturnedreturned

7/30/2019 ch024

27/97

Saturation ModeSaturation Mode

In saturation arithmetic, an overflowIn saturation arithmetic, an overflow

results in the maximum-range value, notresults in the maximum-range value, not

the result of wrapping around the numericthe result of wrapping around the numeric

range.range. Saturation mode is controlled by theSaturation mode is controlled by the

ALUSAT bit in the MODE1 registerALUSAT bit in the MODE1 register

7/30/2019 ch024

28/97

SHARC doesn't have a divide instructionSHARC doesn't have a divide instruction

Iterative algorithms are used to computeIterative algorithms are used to compute

both reciprocals and square roots.both reciprocals and square roots.

TheThe RECIPSRECIPSandand RSQRTSRSQRTSoperations areoperations are

used to start these iterative algorithmsused to start these iterative algorithms

7/30/2019 ch024

29/97

Floating-Point Rounding ModesFloating-Point Rounding Modes

If the TRUNC bit is set, the ALU rounds aIf the TRUNC bit is set, the ALU rounds a

result to zero (truncation). If the TRUNCresult to zero (truncation). If the TRUNC

bit is cleared, the ALU rounds to nearest.bit is cleared, the ALU rounds to nearest.

The rounding modes used for floating-The rounding modes used for floating-

point arithmetic are controlled by two bitspoint arithmetic are controlled by two bits

in the MODE1 registerin the MODE1 register

7/30/2019 ch024

30/97

Multiplication sets the MN (multiplier resultMultiplication sets the MN (multiplier result

negative), MV (multiplier overnegative), MV (multiplier over--flow), MUflow), MU

(multiplier floating-point underflow), and(multiplier floating-point underflow), and

MI (multiplier floatingMI (multiplier floating--point invalidpoint invalidoperation) bits in the ASTAT register.operation) bits in the ASTAT register.

7/30/2019 ch024

31/97

Fn = Fx + FyFn = Fx + Fy

Fn = Fx-FyFn = Fx-Fy

Fn = ABS(Fx + Fy)Fn = ABS(Fx + Fy)

Fn = ABS(Fx-Fy)Fn = ABS(Fx-Fy)

Fn=(Fx + Fy)/2Fn=(Fx + Fy)/2

COMP(Fx,Fy)COMP(Fx,Fy)

Fn = -FxFn = -Fx

AddAdd

SubtractSubtract

Absolute value of sumAbsolute value of sum

Absolute value of differenceAbsolute value of difference

AverageAverage

CompareCompare

NegateNegate

7/30/2019 ch024

32/97

Fn = ABSFxFn = ABSFx

Fn = PASS FxFn = PASS Fx

Fn = RND FxFn = RND Fx

Fn = SCALE Fx by RyFn = SCALE Fx by RyRn = MANX FxRn = MANX Fx

Rn = LOGB FxRn = LOGB Fx

Rn = FIX Fx,Rn = FIX Fx,

Rn = TRUNC FxRn = TRUNC Fx

Fn = FLOAT Rx by RyFn = FLOAT Rx by Ry,,LOAT RxLOAT Rx

Absolute valueAbsolute value

CopyFxtoFnCopyFxtoFn

RoundRound

Scale exponent of Fx by RyScale exponent of Fx by RyExtract mantissa of FxExtract mantissa of Fx

Convert exponent of Fx to integerConvert exponent of Fx to integer

Convert floating-point to integerConvert floating-point to integer

Convert integer to floating-pointConvert integer to floating-point

7/30/2019 ch024

33/97

Fn = RECIPS FxFn = RECIPS Fx

Fn = RSQRTS FxFn = RSQRTS Fx

Fn = Fx COPYSIGN FyFn = Fx COPYSIGN FyFn = MIN(Fx.Fy)Fn = MIN(Fx.Fy)

Fn = MAX(Fx,Fy)Fn = MAX(Fx,Fy)

Fn = CLIPFxbyFyFn = CLIPFxbyFy

Create seed for reciprocalCreate seed for reciprocal

Create seed for reciprocal squareCreate seed for reciprocal square

rootroot

Copy sign of Fy to FxCopy sign of Fy to Fx

Minimum of Fx, FyMinimum of Fx, Fy

Maximum of Fx, FyMaximum of Fx, Fy

Clip Fx within range [-Fy,Fy]Clip Fx within range [-Fy,Fy]

7/30/2019 ch024

34/97

The multiplier performs fixed-point andThe multiplier performs fixed-point and

floating-point multiplication.floating-point multiplication.

perform saturation, rounding, and settingperform saturation, rounding, and setting

the result to 0.the result to 0.

Fixed-point multiplication produces an 80-Fixed-point multiplication produces an 80-

bit resultbit result

7/30/2019 ch024

35/97

Logical shifts fill with zeroes, whileLogical shifts fill with zeroes, while

arithmetic shifts copy sign bits.arithmetic shifts copy sign bits.

The distance to shift, supplied by theThe distance to shift, supplied by the RyRy

register, may be positive for a left shift orregister, may be positive for a left shift or

negative for a right shift.negative for a right shift.

Shift operations set the SZ (shifter zero),Shift operations set the SZ (shifter zero),

SV (shifter overflow), and SS (shifter inputSV (shifter overflow), and SS (shifter input

sign) bits in the ASTAT register.sign) bits in the ASTAT register.

7/30/2019 ch024

36/97

RnRn ==LSHIFT Rx by RyLSHIFT Rx by Ry

Rn = Rn OR LSHIFT Rx by RyRn = Rn OR LSHIFT Rx by Ry

Rn=ASHIFT Rx by RyRn=ASHIFT Rx by Ry

Rn = Rn OR ASHIFT Rx byRn = Rn OR ASHIFT Rx by RyRy

Rn = ROT Rx by RyRn = ROT Rx by RyRn = BCLR Rx by RyRn = BCLR Rx by Ry

Rn = BSET Rx by RyRn = BSET Rx by Ry

Rn = BTGL Rx by RyRn = BTGL Rx by Ry

Logical shift distance RyLogical shift distance Ry

Logical shift and logical ORLogical shift and logical OR

Arithmetic shiftArithmetic shift

Arithmetic shift and logical ORArithmetic shift and logical OR

Rotate distance RyRotate distance RyClear one bit in RxClear one bit in Rx

Set one bit in RxSet one bit in Rx

Toggle one bit in RxToggle one bit in Rx

7/30/2019 ch024

37/97

7/30/2019 ch024

38/97

Rn = EXP Rx (EX)Rn = EXP Rx (EX)

RnRn ==LEFTZ RxLEFTZ Rx

Rn = LEFTO RxRn = LEFTO Rx

Rn = FPACK FxRn = FPACK Fx

Fx = FUNPACK RnFx = FUNPACK Rn

Extract exponent field from ALUExtract exponent field from ALU

Extract number of leading OsExtract number of leading Os

Extract number of leading IsExtract number of leading Is

Convert 32-bit floating-point to 16-Convert 32-bit floating-point to 16-bit floating-pointbit floating-point

Convert 16-bit floating-point to 32-Convert 16-bit floating-point to 32-

bit floating-pointbit floating-point

7/30/2019 ch024

39/97

Ex2-7 Data Operation Status Bits inEx2-7 Data Operation Status Bits in

the SHARCthe SHARC fixed-point ALU calculation -1 + 1 = 0,fixed-point ALU calculation -1 + 1 = 0,

ASTAT status bits are set: AZ = 1, AU = 0,ASTAT status bits are set: AZ = 1, AU = 0,

AN = 0, AV = 0, AC = 1, and AI = 0.AN = 0, AV = 0, AC = 1, and AI = 0.

floating-point operation -1EO+ 1EO =floating-point operation -1EO+ 1EO =

0E0, AOS (ALU fixed-point underflow) will0E0, AOS (ALU fixed-point underflow) will

be similarly set.be similarly set.

7/30/2019 ch024

40/97

Ex2-7Data Operation Status Bits inEx2-7Data Operation Status Bits in

the SHARCthe SHARC fixed-point multiplier operation -2 * 3,fixed-point multiplier operation -2 * 3,

ASTAT bits are set as follows:ASTAT bits are set as follows:

MN = 1, MV = 0, MU = 1, and MI = 0.MN = 1, MV = 0, MU = 1, and MI = 0.

multiplier has four STKY bits, none will be setmultiplier has four STKY bits, none will be set

MOS (multiplier fixed-point overMOS (multiplier fixed-point over--flow),flow),

MVS (multiplier floating-point overflow),MVS (multiplier floating-point overflow),

MUS (multiplier floating-point underflow),MUS (multiplier floating-point underflow), MIS (multiplier floating-point invalid operation).MIS (multiplier floating-point invalid operation).

7/30/2019 ch024

41/97

Ex2-7Data Operation Status Bits inEx2-7Data Operation Status Bits in

the SHARCthe SHARC

For the following shifter operation,For the following shifter operation,

LSHIFT Ox7fffffff BY 3LSHIFT Ox7fffffff BY 3

ASTAT bits will be set as follows:ASTAT bits will be set as follows: SZ = 0, SV = 1, and SS = 0.SZ = 0, SV = 1, and SS = 0.

The shifter has no sticky bits.The shifter has no sticky bits.

7/30/2019 ch024

42/97

operands must be loaded intooperands must be loaded into

registers before operating on them.registers before operating on them.

SHARC supplies special registers that areSHARC supplies special registers that are

used to control loading and storing.used to control loading and storing.

SHARC has twoSHARC has two

ne for the data memory and thene for the data memory and the

other for the program memory.other for the program memory.

7/30/2019 ch024

43/97

DAGsDAGs

Data address generator 1 (DAG1)Data address generator 1 (DAG1)generates 32-bit addresses on the DMgenerates 32-bit addresses on the DM

Address BusAddress Bus

Data address generator 2 (DAG2)Data address generator 2 (DAG2)generates 24-bit addresses on the PMgenerates 24-bit addresses on the PM

Address BusAddress Bus

Each DAG has four types of registers:Each DAG has four types of registers:Index (I), Modify (M), Base (B), andIndex (I), Modify (M), Base (B), andLength (L) registersLength (L) registers

7/30/2019 ch024

44/97

DAGsDAGs

I register acts as a pointer to memoryI register acts as a pointer to memory M register contains the increment valueM register contains the increment value

for advancing the pointer.for advancing the pointer.

B registers and L registers are used onlyB registers and L registers are used onlyfor circular data buffers.for circular data buffers.

B register holds the base address (i.e. theB register holds the base address (i.e. the

first address) of a circular buffer.first address) of a circular buffer. L register contains the number of locationsL register contains the number of locations

in (i.e. the length of) the circular buffer.in (i.e. the length of) the circular buffer.

7/30/2019 ch024

45/97

DAGsDAGs

two DAGs, the SHARC can perform twotwo DAGs, the SHARC can perform two

load-store operations per cycle.load-store operations per cycle.

DAG hardware automatically updates theirDAG hardware automatically updates their

values so that a series of accesses can bevalues so that a series of accesses can be

very easily performed.very easily performed.

DAGs quite useful for the sequentialDAGs quite useful for the sequential

accessesaccesses

7/30/2019 ch024

46/97

DAGsDAGs

Each data address generator has eightEach data address generator has eight

sets of primary registers.sets of primary registers.

Having several sets allows for quickerHaving several sets allows for quicker

access of multiple sets of dataaccess of multiple sets of data

The registers numbered 0 through 7The registers numbered 0 through 7

belong to DAG1, while registers 8 throughbelong to DAG1, while registers 8 through

15 belong to DAG2.15 belong to DAG2.

7/30/2019 ch024

47/97

7/30/2019 ch024

48/97

7/30/2019 ch024

49/97

DAGsDAGs

DAGs provide the following addressingDAGs provide the following addressingmodesmodes

immediate valueimmediate value

R0 = DM (0x2000000);R0 = DM (0x2000000); R0 = DM(_a);R0 = DM(_a); loads R0 the contents of the variable aloads R0 the contents of the variable a

DM(_a) = R0;DM(_a) = R0; stores R0 into memory locationstores R0 into memory location

7/30/2019 ch024

50/97

DAGsDAGs

has the entire address in the instructionhas the entire address in the instruction

address bits take up most of theaddress bits take up most of theinstruction, 32bits/40bitsinstruction, 32bits/40bits

7/30/2019 ch024

51/97

modemode

sweep through a range of addressessweep through a range of addresses

uses an I register and a modifier, Muses an I register and a modifier, M

register or an immediate value.register or an immediate value.

I register specifies the address, updatedI register specifies the address, updated

by the modifier valueby the modifier value

R0 = DM(I3,M1)R0 = DM(I3,M1)

DM(I2,1) = R1DM(I2,1) = R1

7/30/2019 ch024

52/97

addressingaddressing

address of the location to be fetched isaddress of the location to be fetched is

computed as I + M, where I is the basecomputed as I + M, where I is the base

and M is the modifier or offsetand M is the modifier or offset

I0 = 0x2000000 and Ml = 4,I0 = 0x2000000 and Ml = 4,

R0 = DM(M1,I0)R0 = DM(M1,I0)

load DM(0x2000004) into R0load DM(0x2000004) into R0

7/30/2019 ch024

53/97

A circular buffer is an array ofA circular buffer is an array ofnnelements; whenelements; when

thethe n +n +1th element is referenced, the reference1th element is referenced, the reference

goes to buffer location 0, wrapping around fromgoes to buffer location 0, wrapping around from

the end to the beginning of the buffer.the end to the beginning of the buffer. L register is set with a positive, nonzero value asL register is set with a positive, nonzero value as

the starting point in the circular buffer,the starting point in the circular buffer,

B register of the same number is loaded with theB register of the same number is loaded with the

base address of the circular buffer.base address of the circular buffer.

7/30/2019 ch024

54/97

fast Fourier transform (FFT)fast Fourier transform (FFT)

Bit-reversal addressing can be performedBit-reversal addressing can be performed

only in I0 and I8, as controlled by the BR0only in I0 and I8, as controlled by the BR0

and BR8 bits in the MODE1 register.and BR8 bits in the MODE1 register.

7/30/2019 ch024

55/97

allows data to be stored in theallows data to be stored in the

program memoryprogram memory

allows two data fetches per cycleallows two data fetches per cycle

F0 = DM(M0,I0), F1 = PM(M8,I9)F0 = DM(M0,I0), F1 = PM(M8,I9)

simultaneously load F0 from data memorysimultaneously load F0 from data memory

and F1 from program memoryand F1 from program memory

7/30/2019 ch024

56/97

float dm a[N];float dm a[N];

float pm b[N];float pm b[N];

will place the a[] array in data memorywill place the a[] array in data memory

and b[] in program memoryand b[] in program memory

E 2 8 C A i t i SHARCEx2 8 C Assignments in SHARC

7/30/2019 ch024

57/97

Ex2-8 C Assignments in SHARCEx2-8 C Assignments in SHARC

InstructionsInstructions x = (a + b) - c;x = (a + b) - c;

r0 for a, r1 for b, r2 for c, and r3 for xr0 for a, r1 for b, r2 for c, and r3 for x

R0 = DM(_a); ! get value of aR0 = DM(_a); ! get value of a

R1 = DM(_b); ! load value of bR1 = DM(_b); ! load value of b R3 = R0 + R1; ! set result for x to a + bR3 = R0 + R1; ! set result for x to a + b

R2 = DM(_c) ; ! get value of cR2 = DM(_c) ; ! get value of c

SUB R3 = R3 - R2 ; ! complete computation of xSUB R3 = R3 - R2 ; ! complete computation of x DM(_x) = R3 ; ! store x at proper locationDM(_x) = R3 ; ! store x at proper location

E 2 8 C A i t i SHARCEx2 8 C Assignments in SHARC

7/30/2019 ch024

58/97


InstructionsInstructions y = a*(b + c);y = a*(b + c);

use r0 for a, r1 for b, and r2 for both c and yuse r0 for a, r1 for b, and r2 for both c and y

R1 = DM(_b); ! load bR1 = DM(_b); ! load b

R2 = DM(_c); ! load cR2 = DM(_c); ! load c R2 = R1 + R2 ; ! compute partial result for yR2 = R1 + R2 ; ! compute partial result for y

R0 = DM(_a); ! load aR0 = DM(_a); ! load a

R2 = R2 * r0 ; ! compute final value of yR2 = R2 * r0 ; ! compute final value of y DM(_y) = R2 ; ! store yDM(_y) = R2 ; ! store y

7/30/2019 ch024

59/97

Ex2 8 C Assignments in SHARCEx2 8 C Assignments in SHARC

7/30/2019 ch024

60/97


InstructionsInstructions z = (az = (a2) | (b & 15);2) | (b & 15); r0 for a and z, r1 for b, and r3 to hold the bit mask to ber0 for a and z, r1 for b, and r3 to hold the bit mask to be

ANDedANDed R0 = DM(_a) ; ! get value of aR0 = DM(_a) ; ! get value of a

R0 = LSHIFT R0 BY #2 ; ! perform shiftR0 = LSHIFT R0 BY #2 ; ! perform shift R1 = DM(_b) ; ! get value of bR1 = DM(_b) ; ! get value of b R3 = #15 ; ! set up the bit mask forR3 = #15 ; ! set up the bit mask for

ANDingANDing

R1 = R1 AND R3 ; ! perform logical ANDR1 = R1 AND R3 ; ! perform logical AND R0 = R1 OR R0 ; ! compute final value of zR0 = R1 OR R0 ; ! compute final value of z DM(_z) = R0 ; ! store value of zDM(_z) = R0 ; ! store value of z

7/30/2019 ch024

61/97

7/30/2019 ch024

62/97

JUMP instructionJUMP instruction

jumps to the location foojumps to the location foo

- JUMP foo- JUMP foo

Direct:Direct:

specifies a 24-bit address inspecifies a 24-bit address in

immediateimmediate

Indirect: supplyIndirect: supplyby DAG2 data addressby DAG2 data address

generator.generator.

PC-relative:PC-relative:specifies an immediate valuespecifies an immediate value

that is added to the current PC.that is added to the current PC.

7/30/2019 ch024

63/97

loop instructionloop instruction

LCNTR = n, DO Label UNTIL LCE;LCNTR = n, DO Label UNTIL LCE;

loop instruction specifies the following:loop instruction specifies the following:

- length of the loop, loop counter LCNTR- length of the loop, loop counter LCNTR

- Label, the address for the last instruction- Label, the address for the last instruction

in the loopin the loop

- loop termination condition LCE, which- loop termination condition LCE, whichstands for "loop counter expired"stands for "loop counter expired"

7/30/2019 ch024

64/97

True versionTrue version

EQEQ

LTLT

LELE

ACAC

AVAV

DescriptionDescription

ALU = 0ALU = 0

ALU

7/30/2019 ch024

65/97

MVMV

MSMS

SVSV

SZSZFLAGO_INFLAGO_IN

Multiplier overflowMultiplier overflow

Multiplier signMultiplier sign

Shifter overflowShifter overflow

Shifter zeroShifter zeroFlag 0 inputFlag 0 input

NOT MVNOT MV

NOT MSNOT MS

NOT SVNOT SV

NOT SZNOT SZNOT FLAGO_INNOT FLAGO_IN

7/30/2019 ch024

66/97

FLAG1_INFLAG1_IN

FLAG2_INFLAG2_IN

FLAG3_INFLAG3_IN

TFTFLCELCE

NOT LCENOT LCE

Flag 1 inputFlag 1 input



Bit test flagBit test flagLoop counter expiredLoop counter expired

Loop counter notLoop counter not

expiredexpired

NOT FLAG1_INNOT FLAG1_IN



NOT TFNOT TF

7/30/2019 ch024

67/97

Ex2-9 if statementEx2-9 if statement

if (a > b) {if (a > b) {

x = 5;x = 5;

y = c + d;y = c + d;

}}

else x = c - d;else x = c - d;

7/30/2019 ch024

68/97


!test!test

R0 = DM(_a);R0 = DM(_a); ! load a! load a

R1 = DM(_b);R1 = DM(_b); ! load b! load b

COMP(R0,R1)COMP(R0,R1) ! Compare a,b! Compare a,b

IF GE JUMP fbock; ! jump if fails testIF GE JUMP fbock; ! jump if fails test

! true block! true block

7/30/2019 ch024

69/97


tblock:tblock: R0 = 5;R0 = 5; ! get value for x! get value for x

DM(_x) = R0;DM(_x) = R0; ! store value for x! store value for x

R0 = DM(_c);R0 = DM(_c); ! get c! get c

R1 = DM(_d);R1 = DM(_d); ! getd! getd

R1 = R0 + R1;R1 = R0 + R1; !compute c + d!compute c + d

DM(_y) = R1;DM(_y) = R1; ! save value for y! save value for y JUMP other;JUMP other; ! skip false block! skip false block

7/30/2019 ch024

70/97

an example Ex2-9 if statementan example Ex2-9 if statement

! false block! false block

fblock: R0 = DM(_c); ! get cfblock: R0 = DM(_c); ! get c

R1 = DM(_d);R1 = DM(_d); ! get d! get d

R1 = R0 - R1;R1 = R0 - R1; ! compute c - d! compute c - d

DM(_x) = Rl;DM(_x) = Rl; ! save value for x! save value for x

other: ... ! code after ifother: ... ! code after if

7/30/2019 ch024

71/97


if (a > b)if (a > b)

y = c - d;y = c - d;

elseelse

y = c + d;y = c + d;

7/30/2019 ch024

72/97


! load values! load values

R1 = DM(_a);R1 = DM(_a); ! load a! load a

R8 = DM(_b);R8 = DM(_b); ! load b! load b

R2 = DM(_c);R2 = DM(_c); ! load c! load c

R4 = DM(_d); ! load dR4 = DM(_d); ! load d

! compute both sum and difference! compute both sum and difference

7/30/2019 ch024

73/97


r12 = r2 + r4, r0 = r2 - r4;r12 = r2 + r4, r0 = r2 - r4;

! choose which one to save, copy it into r0! choose which one to save, copy it into r0

if necessary, then write to yif necessary, then write to y

comp(r8,rl); ! Compare b,acomp(r8,rl); ! Compare b,a

if ge r0 = r12; ! a

7/30/2019 ch024

74/97

When control reaches the last instructionWhen control reaches the last instruction

in the loop, the machine immediatelyin the loop, the machine immediately

returns to the head of the loop unless thereturns to the head of the loop unless the

loop counter has expired.loop counter has expired. zero-overhead loop: because the jumpzero-overhead loop: because the jump

back to the top of the loop (andback to the top of the loop (and

associated delays) are avoided.associated delays) are avoided.

7/30/2019 ch024

75/97

loop instruction: use two stacks to handleloop instruction: use two stacks to handle

nested loops (one loop contained insidenested loops (one loop contained inside

another).another).

The PC is in fact a stack; a separate stackThe PC is in fact a stack; a separate stackholds the loop counters for all activeholds the loop counters for all active

loops.loops.

PC stack is 30 deep, holds subroutinePC stack is 30 deep, holds subroutinereturn addresses, loop addresses, loopreturn addresses, loop addresses, loop

counter stack is 6 deep.counter stack is 6 deep.

7/30/2019 ch024

76/97

When the DO UNTIL is first encountered,When the DO UNTIL is first encountered,

- loop end address pushed onto PC stack- loop end address pushed onto PC stack

- new loop counter value pushed onto the- new loop counter value pushed onto the

loop counter stack.loop counter stack.

reaches the loop end address,reaches the loop end address,

- CPU automatically decrements the loop- CPU automatically decrements the loopcounter and checks its value.counter and checks its value.

7/30/2019 ch024

77/97

If the termination condition (which may beIf the termination condition (which may beLCE or NOT LCE) is not satisfied, the PC isLCE or NOT LCE) is not satisfied, the PC is

set to the instruction just after the DOset to the instruction just after the DO

UNTIL for another iteration.UNTIL for another iteration. If the condition is satisfied, the two stacksIf the condition is satisfied, the two stacks

are popped and execution continues at theare popped and execution continues at the

instruction after the loop end address.instruction after the loop end address.

l

7/30/2019 ch024

78/97

looploop

for (i = 0, f = 0; i < N; i++)for (i = 0, f = 0; i < N; i++)

f = f + c[i] * x[i];f = f + c[i] * x[i];

! loop setup! loop setup

I0 = _a; ! I0 points to a[0]I0 = _a; ! I0 points to a[0]

M0 = 1; ! set up incrementM0 = 1; ! set up increment

I8 = b; ! I8 points to b[0]I8 = b; ! I8 points to b[0] M8 = 1; ! set up postincrement modeM8 = 1; ! set up postincrement mode

l

7/30/2019 ch024

79/97

looploop

! loop body! loop body

LCNTR = N, DO loopend UNTIL LCE;LCNTR = N, DO loopend UNTIL LCE;

! use postincrement mode! use postincrement mode

R1 = DM(I0,M0), R2 = PM(I8,M8);R1 = DM(I0,M0), R2 = PM(I8,M8);

loopend: R8 = R1*R2, R12 = R12 + R9; !loopend: R8 = R1*R2, R12 = R12 + R9; !

multiply and accumulatemultiply and accumulate

l

7/30/2019 ch024

80/97

looploop

optimized:optimized:

! loop setup! loop setup

I4 = _a; ! load aI4 = _a; ! load a

I12 = _b; ! load bI12 = _b; ! load b

R4 = R4 xor R4, R1 = DM(I4,M6), R2 =R4 = R4 xor R4, R1 = DM(I4,M6), R2 =

PM(I12,M14);PM(I12,M14); MR0F=R4, MODIFY(I7,M7);MR0F=R4, MODIFY(I7,M7);

ll

7/30/2019 ch024

81/97

looploop

! start loop! start loop

LCNTR = 20, DO(PC,loop) UNTIL LCE;LCNTR = 20, DO(PC,loop) UNTIL LCE;

loop: MRF = MRF + R2*R1 (SSI), R1 =loop: MRF = MRF + R2*R1 (SSI), R1 =

DM(I4,M6), R2 = PM(I12,M14);DM(I4,M6), R2 = PM(I12,M14);

! loop clean-up! loop clean-up

R0 = MR0F;R0 = MR0F;

7/30/2019 ch024

82/97

procedure calls,procedure calls, CALL foo;CALL foo;

executed conditionallyexecuted conditionally

IF GT CALL (PC,100);IF GT CALL (PC,100); a PC-relative call to a point 100 locations pasta PC-relative call to a point 100 locations past

the curthe cur--rent PC value.rent PC value.

CALL instruction pushes current PC value plus 1CALL instruction pushes current PC value plus 1onto PC stack before to target address.onto PC stack before to target address.

7/30/2019 ch024

83/97

7/30/2019 ch024

84/97

void f1(int a) {void f1(int a) {

f2(a);f2(a);

}}

SHARC has a PC stack, do not need toSHARC has a PC stack, do not need to

push the return address, only thepush the return address, only the

registers.registers.

SHARC does not have general-purposeSHARC does not have general-purpose

stack operators, use the DAGs tostack operators, use the DAGs to

implement a stack with a little effort.implement a stack with a little effort.

7/30/2019 ch024

85/97

Pushing stack isPushing stack is use postincrementuse postincrementmode, I register automatically points tomode, I register automatically points to

the empty location at the top of the stack.the empty location at the top of the stack.

Reading values off the stack requiresReading values off the stack requiresspecifying a constant offset in the M fieldspecifying a constant offset in the M field

to provide the distance from the end ofto provide the distance from the end of

the stack frame to the variable. Poppingthe stack frame to the variable. Poppingthe stack means modifying the I register.the stack means modifying the I register.

7/30/2019 ch024

86/97

use I1 to point to the stack and weuse I1 to point to the stack and weassume that Ml has been set to 1, theassume that Ml has been set to 1, the

stack push increment, at the start of thestack push increment, at the start of the

program. Here is handwritten code for fl(),program. Here is handwritten code for fl(),which includes a call to f2():which includes a call to f2():

7/30/2019 ch024

87/97

fl:fl: R0 = DM(I1,-1);R0 = DM(I1,-1); ! load argument a into R0! load argument a into R0from stackfrom stack

! call f2()! call f2()

DM(I1,M1) = R0;DM(I1,M1) = R0; ! push f2's argument onto! push f2's argument ontothe stackthe stack

CALL f2;CALL f2; ! call f2! call f2

; return from fl(); return from fl() MODIFY(I1,-1);MODIFY(I1,-1); ! pop one element off stack! pop one element off stack

RTS;RTS; ! return! return

7/30/2019 ch024

88/97

7/30/2019 ch024

89/97

SHARC to allow operations to performeSHARC to allow operations to performesimultaneously.simultaneously.

many machines offer parallel execution,many machines offer parallel execution,

but hidden from the programmer.but hidden from the programmer. The SHARC's wide instruction word allowsThe SHARC's wide instruction word allows

the programmer to put together parallelthe programmer to put together parallel

operationsoperations

7/30/2019 ch024

90/97

The machine supports both memoryThe machine supports both memoryparallelism and operation parallelism.parallelism and operation parallelism.

reduce the number of instructionsreduce the number of instructions

required for common operations.required for common operations. For example, the basic operation in a dotFor example, the basic operation in a dot

product loop can be performed in oneproduct loop can be performed in one

cycle that performs two fetches, acycle that performs two fetches, amultiplication, and an addition.multiplication, and an addition.

7/30/2019 ch024

91/97

The modified Harvard architecture allowsThe modified Harvard architecture allowsmultiple data fetches in a singlemultiple data fetches in a singleinstruction.instruction.

The most common instructions allow aThe most common instructions allow amemory reference and a computation tomemory reference and a computation tobe performed at the same time.be performed at the same time.

Memory references can be done two at aMemory references can be done two at atime in many instructions, with eachtime in many instructions, with eachreference using a DAG.reference using a DAG.

7/30/2019 ch024

92/97

instruction set allows the CPU's functioninstruction set allows the CPU's functionunits to be performed in a singleunits to be performed in a single

instructioninstruction

fixed-point multiply-accumulate and add,fixed-point multiply-accumulate and add,subtract, or average;subtract, or average;

floating-point multiplication and ALUfloating-point multiplication and ALU

operation; andoperation; and multiplication and dual add-subtract.multiplication and dual add-subtract.

7/30/2019 ch024

93/97

restrictions on the sources of the operandsrestrictions on the sources of the operandswhen operations are combined.when operations are combined.

The operands going to the multiplier mustThe operands going to the multiplier must

come from R0 through R7 (or in the casecome from R0 through R7 (or in the caseof floating-point operands, f0 to f7), withof floating-point operands, f0 to f7), with

one input coming from RO-R3/fO-f3 andone input coming from RO-R3/fO-f3 and

the other from R4-R7/f0-f7.the other from R4-R7/f0-f7.

7/30/2019 ch024

94/97

The ALU operands must come from R8-The ALU operands must come from R8-R15/f8-fl5, with one operand coming fromR15/f8-fl5, with one operand coming from

R8-Rll/f8-fll and the other from R12-R8-Rll/f8-fll and the other from R12-

R15/fl2-fl5.R15/fl2-fl5. performs three operations:performs three operations:

R6 = R0 * R4, R9 = R8 + R12, RI0 = R8 -R6 = R0 * R4, R9 = R8 + R12, RI0 = R8 -

R12R12

7/30/2019 ch024

95/97

7/30/2019 ch024

96/97

all CPUs are similarall CPUs are similar read and writeread and writememory, perform data operations, andmemory, perform data operations, and

make decisions.make decisions.

many ways to design an instruction set, asmany ways to design an instruction set, asillustrated by the differences between theillustrated by the differences between the

ARM and the SHARC.ARM and the SHARC.

7/30/2019 ch024

97/97

When designing complex systems, in high-When designing complex systems, in high-level language form, which hides many oflevel language form, which hides many of

the details of the instruction set.the details of the instruction set.

differences in instruction sets can bedifferences in instruction sets can bereflected in nonfunctional characteristics,reflected in nonfunctional characteristics,

such as program size and speed.such as program size and speed.

ch024

Documents