ch024
TRANSCRIPT
-
7/30/2019 ch024
1/97
-
7/30/2019 ch024
2/97
Why DSPWhy DSP
a special class of microprocessors that area special class of microprocessors that are
optimized for computing the real-timeoptimized for computing the real-time
calculations used in signal processingcalculations used in signal processing
DSPs have an architecture that simplifiesDSPs have an architecture that simplifies
application designs and makes low-costapplication designs and makes low-cost
signal processing a realitysignal processing a reality
-
7/30/2019 ch024
3/97
characteristicscharacteristics
fast, flexible computation unitsfast, flexible computation units
unconstrained data flow to and from theunconstrained data flow to and from the
computation unitscomputation units extended precision and dynamic range inextended precision and dynamic range in
the computation unitsthe computation units
dual address generatorsdual address generators efficient program sequencing and loopingefficient program sequencing and looping
mechanismsmechanisms
-
7/30/2019 ch024
4/97
SHARC family of DSPsSHARC family of DSPs
Harvard architectureHarvard architecture
one instructions per lineone instructions per line
each instruction, end with with aeach instruction, end with with asemicolon (;)semicolon (;)
A label, end with a colon (:)A label, end with a colon (:)
Comments, start with an exclamationComments, start with an exclamationpoint (!)point (!)
-
7/30/2019 ch024
5/97
Instructions exampleInstructions example
R1 = DM(M0,I0), R2 = PM(M8,I8); ! aR1 = DM(M0,I0), R2 = PM(M8,I8); ! a
commentcomment
Label:Label:
R3 = R1 + R2;R3 = R1 + R2;
-
7/30/2019 ch024
6/97
-
7/30/2019 ch024
7/97
memorymemory
SHARC uses different word sizes andSHARC uses different word sizes and
address space sizes for instructions andaddress space sizes for instructions and
datadata
instruction consists of 48 bitsinstruction consists of 48 bits
basic data word, 32 bitsbasic data word, 32 bits
address, 32 bitsaddress, 32 bits
-
7/30/2019 ch024
8/97
on-chip memoryon-chip memory
the 21061, has smallest 1Mbit of on-chipthe 21061, has smallest 1Mbit of on-chip
memorymemory
internal memory:internal memory:
(PM),(PM),
(DM)(DM)
-
7/30/2019 ch024
9/97
types of datatypes of data
32-bit IEEE single-precision floating-point32-bit IEEE single-precision floating-point
40-bit IEEE extended-precision floating-40-bit IEEE extended-precision floating-
pointpoint
32-bit integers32-bit integers
-
7/30/2019 ch024
10/97
SHARC memorySHARC memory
allows the program memory to hold bothallows the program memory to hold both
data and instructionsdata and instructions
allow extra data to be squeezed into theallow extra data to be squeezed into the
on-chip memoryon-chip memory
allows data to be fetched from bothallows data to be fetched from both
memories in parallelmemories in parallel
-
7/30/2019 ch024
11/97
SHARC memorySHARC memory
The PM bus is used to access eitherThe PM bus is used to access either
instructions or datainstructions or data
During a single cycle the processor canDuring a single cycle the processor can
access two data operands, one over theaccess two data operands, one over the
PM bus and one over the DM busPM bus and one over the DM bus
-
7/30/2019 ch024
12/97
-
7/30/2019 ch024
13/97
SHARC memorySHARC memory
Each DAG keeps track of up to eightEach DAG keeps track of up to eight
address pointers, eight modifiers and eightaddress pointers, eight modifiers and eight
length valueslength values
A pointer used for indirect addressing canA pointer used for indirect addressing can
be modified by a value in a specifiedbe modified by a value in a specified
registerregister
-
7/30/2019 ch024
14/97
-
7/30/2019 ch024
15/97
SHARC programming modelSHARC programming model
The primary data registers, r0-r15 or f0-The primary data registers, r0-r15 or f0-
f15f15
R0-R15: used for integer operationsR0-R15: used for integer operations
F0-F15: used for floating-point operationsF0-F15: used for floating-point operations
registers are 40 bits long for data typeregisters are 40 bits long for data type
- 40-bit extended-precision floating-point- 40-bit extended-precision floating-pointvaluevalue
- 32-bit data types, in most-significant bits- 32-bit data types, in most-significant bits
-
7/30/2019 ch024
16/97
-
7/30/2019 ch024
17/97
CPUCPU
CPU has three major data function units:CPU has three major data function units:
an ALU, a multiplier, and a shifter.an ALU, a multiplier, and a shifter.
three most-significant mode registers forthree most-significant mode registers for
data operations:data operations:
- arithmetic status- arithmetic status(ASTAT),(ASTAT),
- sticky- sticky(STKY),(STKY),- mode 1- mode 1 (MODE1)(MODE1)
-
7/30/2019 ch024
18/97
The ALU updates seven status flags in theThe ALU updates seven status flags in the
ASTAT register at the end of eachASTAT register at the end of each
operationoperation
ALU also updates fourALU also updates four stickysticky status flagsstatus flags
in the STKY register.in the STKY register.
Once set, a sticky flag remains high untilOnce set, a sticky flag remains high until
explicitly clearedexplicitly cleared
-
7/30/2019 ch024
19/97
ASTATASTATBitBit NameName DefinitionDefinition
00 AZAZ ALU result zero or floating-point underflowALU result zero or floating-point underflow
11 AVAV ALU overflowALU overflow
22
ANAN
ALU result negativeALU result negative
33 ACAC ALU fixed-point carryALU fixed-point carry
44 ASAS ALU X input sign (ABS, MANT operations)ALU X input sign (ABS, MANT operations)
55 AIAI ALU floating-point invalid operationALU floating-point invalid operation1010 AFAF Last ALU operation was a floating-point operationLast ALU operation was a floating-point operation
31-31-
2424CACCCACC Compare Accumulation register (results of last 8Compare Accumulation register (results of last 8
compare operations)compare operations)
-
7/30/2019 ch024
20/97
STKYSTKY
BitBit NameName DefinitionDefinition
00 AUSAUS ALU floating-point underflowALU floating-point underflow
11 AVSAVS ALU floating-point overflowALU floating-point overflow
22 AOSAOS ALU fixed-point overflowALU fixed-point overflow
55 AISAIS ALU floating-point invalid operationALU floating-point invalid operation
-
7/30/2019 ch024
21/97
Rn, Rx,Rn, Rx, andand RyRyare arbitrary data registersare arbitrary data registers
R0-R15R0-R15
operations set various status bits in theoperations set various status bits in the
ASTAT1 and STKY registersASTAT1 and STKY registers
COMPCOMPcompares two values withoutcompares two values without
modifying any data registersmodifying any data registers
-
7/30/2019 ch024
22/97
Rn = Rx+RyRn = Rx+Ry
Rn = Rx-RyRn = Rx-Ry
Rn = Rx+Ry+CIRn = Rx+Ry+CI
Rn = Rx-Ry+CI-lRn = Rx-Ry+CI-l
Rn=(Rx + Ry)/2Rn=(Rx + Ry)/2
COMP(Rx,Ry)COMP(Rx,Ry)
AddAdd
SubtractSubtract
Add with carryAdd with carry
Subtract with borrowSubtract with borrow
AverageAverage
CompareCompare
-
7/30/2019 ch024
23/97
Rn = Rx + CIRn = Rx + CI
Rn = Rx+CI-lRn = Rx+CI-l
Rn = Rx+lRn = Rx+l
Rn = Rx-lRn = Rx-l
Rn = -RxRn = -Rx
Rn = ABS RxRn = ABS Rx
Rn = PASS RxRn = PASS Rx
Add carryAdd carry
Add borrowAdd borrow
IncrementIncrement
DecrementDecrement
NegateNegate
Absolute valueAbsolute value
Copy Rx to RnCopy Rx to Rn
-
7/30/2019 ch024
24/97
Rn = Rx AND RyRn = Rx AND Ry
Rn = Rx OR RyRn = Rx OR Ry
Rn = Rx XOR RyRn = Rx XOR Ry
Rn = NOT RxRn = NOT Rx
Rn = MIN(Rx,Ry)Rn = MIN(Rx,Ry)
Rn = MAX(Rx,Ry)Rn = MAX(Rx,Ry)
Rn = CLIP Rx by RyRn = CLIP Rx by Ry
Logical ANDLogical AND
Logical ORLogical OR
Logical exclusive ORLogical exclusive OR
Logical negateLogical negate
Minimum of Rx, RyMinimum of Rx, Ry
Maximum of Rx, RyMaximum of Rx, Ry
Clip Rx within range [-Ry,Ry]Clip Rx within range [-Ry,Ry]
-
7/30/2019 ch024
25/97
All the ALU operations set the AZ (ALU resultAll the ALU operations set the AZ (ALU resultzero), AN (ALU result negazero), AN (ALU result nega--tive), AV (ALU resulttive), AV (ALU resultoverflow), AC (ALU fixed-point carry), and AIoverflow), AC (ALU fixed-point carry), and AI(floating(floating--point invalid) bits in the ASTAT register.point invalid) bits in the ASTAT register.
STKY register is a sticky version of ASTATSTKY register is a sticky version of ASTATregister.register.
STKY bits are set along with the ASTAT registerSTKY bits are set along with the ASTAT register
bits, but are not cleared.bits, but are not cleared. STKY bits always remain set until cleared by anSTKY bits always remain set until cleared by an
instruction.instruction.
-
7/30/2019 ch024
26/97
Saturation ModeSaturation Mode
The SHARC can performThe SHARC can perform
arithmetic on fixed-point values.arithmetic on fixed-point values.
all positive fixed-point overflows cause theall positive fixed-point overflows cause the
maximum positive fixed-point numbermaximum positive fixed-point number
(0x7FFF FFFF) to be returned, and all(0x7FFF FFFF) to be returned, and all
negative overflows cause the maximumnegative overflows cause the maximum
negative number (0x8000 0000) to benegative number (0x8000 0000) to bereturnedreturned
-
7/30/2019 ch024
27/97
Saturation ModeSaturation Mode
In saturation arithmetic, an overflowIn saturation arithmetic, an overflow
results in the maximum-range value, notresults in the maximum-range value, not
the result of wrapping around the numericthe result of wrapping around the numeric
range.range. Saturation mode is controlled by theSaturation mode is controlled by the
ALUSAT bit in the MODE1 registerALUSAT bit in the MODE1 register
-
7/30/2019 ch024
28/97
SHARC doesn't have a divide instructionSHARC doesn't have a divide instruction
Iterative algorithms are used to computeIterative algorithms are used to compute
both reciprocals and square roots.both reciprocals and square roots.
TheThe RECIPSRECIPSandand RSQRTSRSQRTSoperations areoperations are
used to start these iterative algorithmsused to start these iterative algorithms
-
7/30/2019 ch024
29/97
Floating-Point Rounding ModesFloating-Point Rounding Modes
If the TRUNC bit is set, the ALU rounds aIf the TRUNC bit is set, the ALU rounds a
result to zero (truncation). If the TRUNCresult to zero (truncation). If the TRUNC
bit is cleared, the ALU rounds to nearest.bit is cleared, the ALU rounds to nearest.
The rounding modes used for floating-The rounding modes used for floating-
point arithmetic are controlled by two bitspoint arithmetic are controlled by two bits
in the MODE1 registerin the MODE1 register
-
7/30/2019 ch024
30/97
Multiplication sets the MN (multiplier resultMultiplication sets the MN (multiplier result
negative), MV (multiplier overnegative), MV (multiplier over--flow), MUflow), MU
(multiplier floating-point underflow), and(multiplier floating-point underflow), and
MI (multiplier floatingMI (multiplier floating--point invalidpoint invalidoperation) bits in the ASTAT register.operation) bits in the ASTAT register.
-
7/30/2019 ch024
31/97
Fn = Fx + FyFn = Fx + Fy
Fn = Fx-FyFn = Fx-Fy
Fn = ABS(Fx + Fy)Fn = ABS(Fx + Fy)
Fn = ABS(Fx-Fy)Fn = ABS(Fx-Fy)
Fn=(Fx + Fy)/2Fn=(Fx + Fy)/2
COMP(Fx,Fy)COMP(Fx,Fy)
Fn = -FxFn = -Fx
AddAdd
SubtractSubtract
Absolute value of sumAbsolute value of sum
Absolute value of differenceAbsolute value of difference
AverageAverage
CompareCompare
NegateNegate
-
7/30/2019 ch024
32/97
Fn = ABSFxFn = ABSFx
Fn = PASS FxFn = PASS Fx
Fn = RND FxFn = RND Fx
Fn = SCALE Fx by RyFn = SCALE Fx by RyRn = MANX FxRn = MANX Fx
Rn = LOGB FxRn = LOGB Fx
Rn = FIX Fx,Rn = FIX Fx,
Rn = TRUNC FxRn = TRUNC Fx
Fn = FLOAT Rx by RyFn = FLOAT Rx by Ry,,LOAT RxLOAT Rx
Absolute valueAbsolute value
CopyFxtoFnCopyFxtoFn
RoundRound
Scale exponent of Fx by RyScale exponent of Fx by RyExtract mantissa of FxExtract mantissa of Fx
Convert exponent of Fx to integerConvert exponent of Fx to integer
Convert floating-point to integerConvert floating-point to integer
Convert integer to floating-pointConvert integer to floating-point
-
7/30/2019 ch024
33/97
Fn = RECIPS FxFn = RECIPS Fx
Fn = RSQRTS FxFn = RSQRTS Fx
Fn = Fx COPYSIGN FyFn = Fx COPYSIGN FyFn = MIN(Fx.Fy)Fn = MIN(Fx.Fy)
Fn = MAX(Fx,Fy)Fn = MAX(Fx,Fy)
Fn = CLIPFxbyFyFn = CLIPFxbyFy
Create seed for reciprocalCreate seed for reciprocal
Create seed for reciprocal squareCreate seed for reciprocal square
rootroot
Copy sign of Fy to FxCopy sign of Fy to Fx
Minimum of Fx, FyMinimum of Fx, Fy
Maximum of Fx, FyMaximum of Fx, Fy
Clip Fx within range [-Fy,Fy]Clip Fx within range [-Fy,Fy]
-
7/30/2019 ch024
34/97
The multiplier performs fixed-point andThe multiplier performs fixed-point and
floating-point multiplication.floating-point multiplication.
perform saturation, rounding, and settingperform saturation, rounding, and setting
the result to 0.the result to 0.
Fixed-point multiplication produces an 80-Fixed-point multiplication produces an 80-
bit resultbit result
-
7/30/2019 ch024
35/97
Logical shifts fill with zeroes, whileLogical shifts fill with zeroes, while
arithmetic shifts copy sign bits.arithmetic shifts copy sign bits.
The distance to shift, supplied by theThe distance to shift, supplied by the RyRy
register, may be positive for a left shift orregister, may be positive for a left shift or
negative for a right shift.negative for a right shift.
Shift operations set the SZ (shifter zero),Shift operations set the SZ (shifter zero),
SV (shifter overflow), and SS (shifter inputSV (shifter overflow), and SS (shifter input
sign) bits in the ASTAT register.sign) bits in the ASTAT register.
-
7/30/2019 ch024
36/97
RnRn ==LSHIFT Rx by RyLSHIFT Rx by Ry
Rn = Rn OR LSHIFT Rx by RyRn = Rn OR LSHIFT Rx by Ry
Rn=ASHIFT Rx by RyRn=ASHIFT Rx by Ry
Rn = Rn OR ASHIFT Rx byRn = Rn OR ASHIFT Rx by RyRy
Rn = ROT Rx by RyRn = ROT Rx by RyRn = BCLR Rx by RyRn = BCLR Rx by Ry
Rn = BSET Rx by RyRn = BSET Rx by Ry
Rn = BTGL Rx by RyRn = BTGL Rx by Ry
Logical shift distance RyLogical shift distance Ry
Logical shift and logical ORLogical shift and logical OR
Arithmetic shiftArithmetic shift
Arithmetic shift and logical ORArithmetic shift and logical OR
Rotate distance RyRotate distance RyClear one bit in RxClear one bit in Rx
Set one bit in RxSet one bit in Rx
Toggle one bit in RxToggle one bit in Rx
-
7/30/2019 ch024
37/97
-
7/30/2019 ch024
38/97
Rn = EXP Rx (EX)Rn = EXP Rx (EX)
RnRn ==LEFTZ RxLEFTZ Rx
Rn = LEFTO RxRn = LEFTO Rx
Rn = FPACK FxRn = FPACK Fx
Fx = FUNPACK RnFx = FUNPACK Rn
Extract exponent field from ALUExtract exponent field from ALU
Extract number of leading OsExtract number of leading Os
Extract number of leading IsExtract number of leading Is
Convert 32-bit floating-point to 16-Convert 32-bit floating-point to 16-bit floating-pointbit floating-point
Convert 16-bit floating-point to 32-Convert 16-bit floating-point to 32-
bit floating-pointbit floating-point
-
7/30/2019 ch024
39/97
Ex2-7 Data Operation Status Bits inEx2-7 Data Operation Status Bits in
the SHARCthe SHARC fixed-point ALU calculation -1 + 1 = 0,fixed-point ALU calculation -1 + 1 = 0,
ASTAT status bits are set: AZ = 1, AU = 0,ASTAT status bits are set: AZ = 1, AU = 0,
AN = 0, AV = 0, AC = 1, and AI = 0.AN = 0, AV = 0, AC = 1, and AI = 0.
floating-point operation -1EO+ 1EO =floating-point operation -1EO+ 1EO =
0E0, AOS (ALU fixed-point underflow) will0E0, AOS (ALU fixed-point underflow) will
be similarly set.be similarly set.
-
7/30/2019 ch024
40/97
Ex2-7Data Operation Status Bits inEx2-7Data Operation Status Bits in
the SHARCthe SHARC fixed-point multiplier operation -2 * 3,fixed-point multiplier operation -2 * 3,
ASTAT bits are set as follows:ASTAT bits are set as follows:
MN = 1, MV = 0, MU = 1, and MI = 0.MN = 1, MV = 0, MU = 1, and MI = 0.
multiplier has four STKY bits, none will be setmultiplier has four STKY bits, none will be set
MOS (multiplier fixed-point overMOS (multiplier fixed-point over--flow),flow),
MVS (multiplier floating-point overflow),MVS (multiplier floating-point overflow),
MUS (multiplier floating-point underflow),MUS (multiplier floating-point underflow), MIS (multiplier floating-point invalid operation).MIS (multiplier floating-point invalid operation).
-
7/30/2019 ch024
41/97
Ex2-7Data Operation Status Bits inEx2-7Data Operation Status Bits in
the SHARCthe SHARC
For the following shifter operation,For the following shifter operation,
LSHIFT Ox7fffffff BY 3LSHIFT Ox7fffffff BY 3
ASTAT bits will be set as follows:ASTAT bits will be set as follows: SZ = 0, SV = 1, and SS = 0.SZ = 0, SV = 1, and SS = 0.
The shifter has no sticky bits.The shifter has no sticky bits.
-
7/30/2019 ch024
42/97
operands must be loaded intooperands must be loaded into
registers before operating on them.registers before operating on them.
SHARC supplies special registers that areSHARC supplies special registers that are
used to control loading and storing.used to control loading and storing.
SHARC has twoSHARC has two
ne for the data memory and thene for the data memory and the
other for the program memory.other for the program memory.
-
7/30/2019 ch024
43/97
DAGsDAGs
Data address generator 1 (DAG1)Data address generator 1 (DAG1)generates 32-bit addresses on the DMgenerates 32-bit addresses on the DM
Address BusAddress Bus
Data address generator 2 (DAG2)Data address generator 2 (DAG2)generates 24-bit addresses on the PMgenerates 24-bit addresses on the PM
Address BusAddress Bus
Each DAG has four types of registers:Each DAG has four types of registers:Index (I), Modify (M), Base (B), andIndex (I), Modify (M), Base (B), andLength (L) registersLength (L) registers
-
7/30/2019 ch024
44/97
DAGsDAGs
I register acts as a pointer to memoryI register acts as a pointer to memory M register contains the increment valueM register contains the increment value
for advancing the pointer.for advancing the pointer.
B registers and L registers are used onlyB registers and L registers are used onlyfor circular data buffers.for circular data buffers.
B register holds the base address (i.e. theB register holds the base address (i.e. the
first address) of a circular buffer.first address) of a circular buffer. L register contains the number of locationsL register contains the number of locations
in (i.e. the length of) the circular buffer.in (i.e. the length of) the circular buffer.
-
7/30/2019 ch024
45/97
DAGsDAGs
two DAGs, the SHARC can perform twotwo DAGs, the SHARC can perform two
load-store operations per cycle.load-store operations per cycle.
DAG hardware automatically updates theirDAG hardware automatically updates their
values so that a series of accesses can bevalues so that a series of accesses can be
very easily performed.very easily performed.
DAGs quite useful for the sequentialDAGs quite useful for the sequential
accessesaccesses
-
7/30/2019 ch024
46/97
DAGsDAGs
Each data address generator has eightEach data address generator has eight
sets of primary registers.sets of primary registers.
Having several sets allows for quickerHaving several sets allows for quicker
access of multiple sets of dataaccess of multiple sets of data
The registers numbered 0 through 7The registers numbered 0 through 7
belong to DAG1, while registers 8 throughbelong to DAG1, while registers 8 through
15 belong to DAG2.15 belong to DAG2.
-
7/30/2019 ch024
47/97
-
7/30/2019 ch024
48/97
-
7/30/2019 ch024
49/97
DAGsDAGs
DAGs provide the following addressingDAGs provide the following addressingmodesmodes
immediate valueimmediate value
R0 = DM (0x2000000);R0 = DM (0x2000000); R0 = DM(_a);R0 = DM(_a); loads R0 the contents of the variable aloads R0 the contents of the variable a
DM(_a) = R0;DM(_a) = R0; stores R0 into memory locationstores R0 into memory location
-
7/30/2019 ch024
50/97
DAGsDAGs
has the entire address in the instructionhas the entire address in the instruction
address bits take up most of theaddress bits take up most of theinstruction, 32bits/40bitsinstruction, 32bits/40bits
-
7/30/2019 ch024
51/97
modemode
sweep through a range of addressessweep through a range of addresses
uses an I register and a modifier, Muses an I register and a modifier, M
register or an immediate value.register or an immediate value.
I register specifies the address, updatedI register specifies the address, updated
by the modifier valueby the modifier value
R0 = DM(I3,M1)R0 = DM(I3,M1)
DM(I2,1) = R1DM(I2,1) = R1
-
7/30/2019 ch024
52/97
addressingaddressing
address of the location to be fetched isaddress of the location to be fetched is
computed as I + M, where I is the basecomputed as I + M, where I is the base
and M is the modifier or offsetand M is the modifier or offset
I0 = 0x2000000 and Ml = 4,I0 = 0x2000000 and Ml = 4,
R0 = DM(M1,I0)R0 = DM(M1,I0)
load DM(0x2000004) into R0load DM(0x2000004) into R0
-
7/30/2019 ch024
53/97
A circular buffer is an array ofA circular buffer is an array ofnnelements; whenelements; when
thethe n +n +1th element is referenced, the reference1th element is referenced, the reference
goes to buffer location 0, wrapping around fromgoes to buffer location 0, wrapping around from
the end to the beginning of the buffer.the end to the beginning of the buffer. L register is set with a positive, nonzero value asL register is set with a positive, nonzero value as
the starting point in the circular buffer,the starting point in the circular buffer,
B register of the same number is loaded with theB register of the same number is loaded with the
base address of the circular buffer.base address of the circular buffer.
-
7/30/2019 ch024
54/97
fast Fourier transform (FFT)fast Fourier transform (FFT)
Bit-reversal addressing can be performedBit-reversal addressing can be performed
only in I0 and I8, as controlled by the BR0only in I0 and I8, as controlled by the BR0
and BR8 bits in the MODE1 register.and BR8 bits in the MODE1 register.
-
7/30/2019 ch024
55/97
allows data to be stored in theallows data to be stored in the
program memoryprogram memory
allows two data fetches per cycleallows two data fetches per cycle
F0 = DM(M0,I0), F1 = PM(M8,I9)F0 = DM(M0,I0), F1 = PM(M8,I9)
simultaneously load F0 from data memorysimultaneously load F0 from data memory
and F1 from program memoryand F1 from program memory
-
7/30/2019 ch024
56/97
float dm a[N];float dm a[N];
float pm b[N];float pm b[N];
will place the a[] array in data memorywill place the a[] array in data memory
and b[] in program memoryand b[] in program memory
E 2 8 C A i t i SHARCEx2 8 C Assignments in SHARC
-
7/30/2019 ch024
57/97
Ex2-8 C Assignments in SHARCEx2-8 C Assignments in SHARC
InstructionsInstructions x = (a + b) - c;x = (a + b) - c;
r0 for a, r1 for b, r2 for c, and r3 for xr0 for a, r1 for b, r2 for c, and r3 for x
R0 = DM(_a); ! get value of aR0 = DM(_a); ! get value of a
R1 = DM(_b); ! load value of bR1 = DM(_b); ! load value of b R3 = R0 + R1; ! set result for x to a + bR3 = R0 + R1; ! set result for x to a + b
R2 = DM(_c) ; ! get value of cR2 = DM(_c) ; ! get value of c
SUB R3 = R3 - R2 ; ! complete computation of xSUB R3 = R3 - R2 ; ! complete computation of x DM(_x) = R3 ; ! store x at proper locationDM(_x) = R3 ; ! store x at proper location
E 2 8 C A i t i SHARCEx2 8 C Assignments in SHARC
-
7/30/2019 ch024
58/97
Ex2-8 C Assignments in SHARCEx2-8 C Assignments in SHARC
InstructionsInstructions y = a*(b + c);y = a*(b + c);
use r0 for a, r1 for b, and r2 for both c and yuse r0 for a, r1 for b, and r2 for both c and y
R1 = DM(_b); ! load bR1 = DM(_b); ! load b
R2 = DM(_c); ! load cR2 = DM(_c); ! load c R2 = R1 + R2 ; ! compute partial result for yR2 = R1 + R2 ; ! compute partial result for y
R0 = DM(_a); ! load aR0 = DM(_a); ! load a
R2 = R2 * r0 ; ! compute final value of yR2 = R2 * r0 ; ! compute final value of y DM(_y) = R2 ; ! store yDM(_y) = R2 ; ! store y
-
7/30/2019 ch024
59/97
Ex2 8 C Assignments in SHARCEx2 8 C Assignments in SHARC
-
7/30/2019 ch024
60/97
Ex2-8 C Assignments in SHARCEx2-8 C Assignments in SHARC
InstructionsInstructions z = (az = (a2) | (b & 15);2) | (b & 15); r0 for a and z, r1 for b, and r3 to hold the bit mask to ber0 for a and z, r1 for b, and r3 to hold the bit mask to be
ANDedANDed R0 = DM(_a) ; ! get value of aR0 = DM(_a) ; ! get value of a
R0 = LSHIFT R0 BY #2 ; ! perform shiftR0 = LSHIFT R0 BY #2 ; ! perform shift R1 = DM(_b) ; ! get value of bR1 = DM(_b) ; ! get value of b R3 = #15 ; ! set up the bit mask forR3 = #15 ; ! set up the bit mask for
ANDingANDing
R1 = R1 AND R3 ; ! perform logical ANDR1 = R1 AND R3 ; ! perform logical AND R0 = R1 OR R0 ; ! compute final value of zR0 = R1 OR R0 ; ! compute final value of z DM(_z) = R0 ; ! store value of zDM(_z) = R0 ; ! store value of z
-
7/30/2019 ch024
61/97
-
7/30/2019 ch024
62/97
JUMP instructionJUMP instruction
jumps to the location foojumps to the location foo
- JUMP foo- JUMP foo
Direct:Direct:
specifies a 24-bit address inspecifies a 24-bit address in
immediateimmediate
Indirect: supplyIndirect: supplyby DAG2 data addressby DAG2 data address
generator.generator.
PC-relative:PC-relative:specifies an immediate valuespecifies an immediate value
that is added to the current PC.that is added to the current PC.
-
7/30/2019 ch024
63/97
loop instructionloop instruction
LCNTR = n, DO Label UNTIL LCE;LCNTR = n, DO Label UNTIL LCE;
loop instruction specifies the following:loop instruction specifies the following:
- length of the loop, loop counter LCNTR- length of the loop, loop counter LCNTR
- Label, the address for the last instruction- Label, the address for the last instruction
in the loopin the loop
- loop termination condition LCE, which- loop termination condition LCE, whichstands for "loop counter expired"stands for "loop counter expired"
-
7/30/2019 ch024
64/97
True versionTrue version
EQEQ
LTLT
LELE
ACAC
AVAV
DescriptionDescription
ALU = 0ALU = 0
ALU
-
7/30/2019 ch024
65/97
MVMV
MSMS
SVSV
SZSZFLAGO_INFLAGO_IN
Multiplier overflowMultiplier overflow
Multiplier signMultiplier sign
Shifter overflowShifter overflow
Shifter zeroShifter zeroFlag 0 inputFlag 0 input
NOT MVNOT MV
NOT MSNOT MS
NOT SVNOT SV
NOT SZNOT SZNOT FLAGO_INNOT FLAGO_IN
-
7/30/2019 ch024
66/97
FLAG1_INFLAG1_IN
FLAG2_INFLAG2_IN
FLAG3_INFLAG3_IN
TFTFLCELCE
NOT LCENOT LCE
Flag 1 inputFlag 1 input
Flag 2 inputFlag 2 input
Flag 3 inputFlag 3 input
Bit test flagBit test flagLoop counter expiredLoop counter expired
Loop counter notLoop counter not
expiredexpired
NOT FLAG1_INNOT FLAG1_IN
NOT FLAG2_INNOT FLAG2_IN
NOT FLAG3_INNOT FLAG3_IN
NOT TFNOT TF
-
7/30/2019 ch024
67/97
Ex2-9 if statementEx2-9 if statement
if (a > b) {if (a > b) {
x = 5;x = 5;
y = c + d;y = c + d;
}}
else x = c - d;else x = c - d;
-
7/30/2019 ch024
68/97
Ex2-9 if statementEx2-9 if statement
!test!test
R0 = DM(_a);R0 = DM(_a); ! load a! load a
R1 = DM(_b);R1 = DM(_b); ! load b! load b
COMP(R0,R1)COMP(R0,R1) ! Compare a,b! Compare a,b
IF GE JUMP fbock; ! jump if fails testIF GE JUMP fbock; ! jump if fails test
! true block! true block
-
7/30/2019 ch024
69/97
Ex2-9 if statementEx2-9 if statement
tblock:tblock: R0 = 5;R0 = 5; ! get value for x! get value for x
DM(_x) = R0;DM(_x) = R0; ! store value for x! store value for x
R0 = DM(_c);R0 = DM(_c); ! get c! get c
R1 = DM(_d);R1 = DM(_d); ! getd! getd
R1 = R0 + R1;R1 = R0 + R1; !compute c + d!compute c + d
DM(_y) = R1;DM(_y) = R1; ! save value for y! save value for y JUMP other;JUMP other; ! skip false block! skip false block
-
7/30/2019 ch024
70/97
an example Ex2-9 if statementan example Ex2-9 if statement
! false block! false block
fblock: R0 = DM(_c); ! get cfblock: R0 = DM(_c); ! get c
R1 = DM(_d);R1 = DM(_d); ! get d! get d
R1 = R0 - R1;R1 = R0 - R1; ! compute c - d! compute c - d
DM(_x) = Rl;DM(_x) = Rl; ! save value for x! save value for x
other: ... ! code after ifother: ... ! code after if
-
7/30/2019 ch024
71/97
Ex2-9 if statementEx2-9 if statement
if (a > b)if (a > b)
y = c - d;y = c - d;
elseelse
y = c + d;y = c + d;
-
7/30/2019 ch024
72/97
Ex2-9 if statementEx2-9 if statement
! load values! load values
R1 = DM(_a);R1 = DM(_a); ! load a! load a
R8 = DM(_b);R8 = DM(_b); ! load b! load b
R2 = DM(_c);R2 = DM(_c); ! load c! load c
R4 = DM(_d); ! load dR4 = DM(_d); ! load d
! compute both sum and difference! compute both sum and difference
-
7/30/2019 ch024
73/97
Ex2-9 if statementEx2-9 if statement
r12 = r2 + r4, r0 = r2 - r4;r12 = r2 + r4, r0 = r2 - r4;
! choose which one to save, copy it into r0! choose which one to save, copy it into r0
if necessary, then write to yif necessary, then write to y
comp(r8,rl); ! Compare b,acomp(r8,rl); ! Compare b,a
if ge r0 = r12; ! a
-
7/30/2019 ch024
74/97
When control reaches the last instructionWhen control reaches the last instruction
in the loop, the machine immediatelyin the loop, the machine immediately
returns to the head of the loop unless thereturns to the head of the loop unless the
loop counter has expired.loop counter has expired. zero-overhead loop: because the jumpzero-overhead loop: because the jump
back to the top of the loop (andback to the top of the loop (and
associated delays) are avoided.associated delays) are avoided.
-
7/30/2019 ch024
75/97
loop instruction: use two stacks to handleloop instruction: use two stacks to handle
nested loops (one loop contained insidenested loops (one loop contained inside
another).another).
The PC is in fact a stack; a separate stackThe PC is in fact a stack; a separate stackholds the loop counters for all activeholds the loop counters for all active
loops.loops.
PC stack is 30 deep, holds subroutinePC stack is 30 deep, holds subroutinereturn addresses, loop addresses, loopreturn addresses, loop addresses, loop
counter stack is 6 deep.counter stack is 6 deep.
-
7/30/2019 ch024
76/97
When the DO UNTIL is first encountered,When the DO UNTIL is first encountered,
- loop end address pushed onto PC stack- loop end address pushed onto PC stack
- new loop counter value pushed onto the- new loop counter value pushed onto the
loop counter stack.loop counter stack.
reaches the loop end address,reaches the loop end address,
- CPU automatically decrements the loop- CPU automatically decrements the loopcounter and checks its value.counter and checks its value.
-
7/30/2019 ch024
77/97
If the termination condition (which may beIf the termination condition (which may beLCE or NOT LCE) is not satisfied, the PC isLCE or NOT LCE) is not satisfied, the PC is
set to the instruction just after the DOset to the instruction just after the DO
UNTIL for another iteration.UNTIL for another iteration. If the condition is satisfied, the two stacksIf the condition is satisfied, the two stacks
are popped and execution continues at theare popped and execution continues at the
instruction after the loop end address.instruction after the loop end address.
l
-
7/30/2019 ch024
78/97
looploop
for (i = 0, f = 0; i < N; i++)for (i = 0, f = 0; i < N; i++)
f = f + c[i] * x[i];f = f + c[i] * x[i];
! loop setup! loop setup
I0 = _a; ! I0 points to a[0]I0 = _a; ! I0 points to a[0]
M0 = 1; ! set up incrementM0 = 1; ! set up increment
I8 = b; ! I8 points to b[0]I8 = b; ! I8 points to b[0] M8 = 1; ! set up postincrement modeM8 = 1; ! set up postincrement mode
l
-
7/30/2019 ch024
79/97
looploop
! loop body! loop body
LCNTR = N, DO loopend UNTIL LCE;LCNTR = N, DO loopend UNTIL LCE;
! use postincrement mode! use postincrement mode
R1 = DM(I0,M0), R2 = PM(I8,M8);R1 = DM(I0,M0), R2 = PM(I8,M8);
loopend: R8 = R1*R2, R12 = R12 + R9; !loopend: R8 = R1*R2, R12 = R12 + R9; !
multiply and accumulatemultiply and accumulate
l
-
7/30/2019 ch024
80/97
looploop
optimized:optimized:
! loop setup! loop setup
I4 = _a; ! load aI4 = _a; ! load a
I12 = _b; ! load bI12 = _b; ! load b
R4 = R4 xor R4, R1 = DM(I4,M6), R2 =R4 = R4 xor R4, R1 = DM(I4,M6), R2 =
PM(I12,M14);PM(I12,M14); MR0F=R4, MODIFY(I7,M7);MR0F=R4, MODIFY(I7,M7);
ll
-
7/30/2019 ch024
81/97
looploop
! start loop! start loop
LCNTR = 20, DO(PC,loop) UNTIL LCE;LCNTR = 20, DO(PC,loop) UNTIL LCE;
loop: MRF = MRF + R2*R1 (SSI), R1 =loop: MRF = MRF + R2*R1 (SSI), R1 =
DM(I4,M6), R2 = PM(I12,M14);DM(I4,M6), R2 = PM(I12,M14);
! loop clean-up! loop clean-up
R0 = MR0F;R0 = MR0F;
-
7/30/2019 ch024
82/97
procedure calls,procedure calls, CALL foo;CALL foo;
executed conditionallyexecuted conditionally
IF GT CALL (PC,100);IF GT CALL (PC,100); a PC-relative call to a point 100 locations pasta PC-relative call to a point 100 locations past
the curthe cur--rent PC value.rent PC value.
CALL instruction pushes current PC value plus 1CALL instruction pushes current PC value plus 1onto PC stack before to target address.onto PC stack before to target address.
-
7/30/2019 ch024
83/97
-
7/30/2019 ch024
84/97
void f1(int a) {void f1(int a) {
f2(a);f2(a);
}}
SHARC has a PC stack, do not need toSHARC has a PC stack, do not need to
push the return address, only thepush the return address, only the
registers.registers.
SHARC does not have general-purposeSHARC does not have general-purpose
stack operators, use the DAGs tostack operators, use the DAGs to
implement a stack with a little effort.implement a stack with a little effort.
-
7/30/2019 ch024
85/97
Pushing stack isPushing stack is use postincrementuse postincrementmode, I register automatically points tomode, I register automatically points to
the empty location at the top of the stack.the empty location at the top of the stack.
Reading values off the stack requiresReading values off the stack requiresspecifying a constant offset in the M fieldspecifying a constant offset in the M field
to provide the distance from the end ofto provide the distance from the end of
the stack frame to the variable. Poppingthe stack frame to the variable. Poppingthe stack means modifying the I register.the stack means modifying the I register.
-
7/30/2019 ch024
86/97
use I1 to point to the stack and weuse I1 to point to the stack and weassume that Ml has been set to 1, theassume that Ml has been set to 1, the
stack push increment, at the start of thestack push increment, at the start of the
program. Here is handwritten code for fl(),program. Here is handwritten code for fl(),which includes a call to f2():which includes a call to f2():
-
7/30/2019 ch024
87/97
fl:fl: R0 = DM(I1,-1);R0 = DM(I1,-1); ! load argument a into R0! load argument a into R0from stackfrom stack
! call f2()! call f2()
DM(I1,M1) = R0;DM(I1,M1) = R0; ! push f2's argument onto! push f2's argument ontothe stackthe stack
CALL f2;CALL f2; ! call f2! call f2
; return from fl(); return from fl() MODIFY(I1,-1);MODIFY(I1,-1); ! pop one element off stack! pop one element off stack
RTS;RTS; ! return! return
-
7/30/2019 ch024
88/97
-
7/30/2019 ch024
89/97
SHARC to allow operations to performeSHARC to allow operations to performesimultaneously.simultaneously.
many machines offer parallel execution,many machines offer parallel execution,
but hidden from the programmer.but hidden from the programmer. The SHARC's wide instruction word allowsThe SHARC's wide instruction word allows
the programmer to put together parallelthe programmer to put together parallel
operationsoperations
-
7/30/2019 ch024
90/97
The machine supports both memoryThe machine supports both memoryparallelism and operation parallelism.parallelism and operation parallelism.
reduce the number of instructionsreduce the number of instructions
required for common operations.required for common operations. For example, the basic operation in a dotFor example, the basic operation in a dot
product loop can be performed in oneproduct loop can be performed in one
cycle that performs two fetches, acycle that performs two fetches, amultiplication, and an addition.multiplication, and an addition.
-
7/30/2019 ch024
91/97
The modified Harvard architecture allowsThe modified Harvard architecture allowsmultiple data fetches in a singlemultiple data fetches in a singleinstruction.instruction.
The most common instructions allow aThe most common instructions allow amemory reference and a computation tomemory reference and a computation tobe performed at the same time.be performed at the same time.
Memory references can be done two at aMemory references can be done two at atime in many instructions, with eachtime in many instructions, with eachreference using a DAG.reference using a DAG.
-
7/30/2019 ch024
92/97
instruction set allows the CPU's functioninstruction set allows the CPU's functionunits to be performed in a singleunits to be performed in a single
instructioninstruction
fixed-point multiply-accumulate and add,fixed-point multiply-accumulate and add,subtract, or average;subtract, or average;
floating-point multiplication and ALUfloating-point multiplication and ALU
operation; andoperation; and multiplication and dual add-subtract.multiplication and dual add-subtract.
-
7/30/2019 ch024
93/97
restrictions on the sources of the operandsrestrictions on the sources of the operandswhen operations are combined.when operations are combined.
The operands going to the multiplier mustThe operands going to the multiplier must
come from R0 through R7 (or in the casecome from R0 through R7 (or in the caseof floating-point operands, f0 to f7), withof floating-point operands, f0 to f7), with
one input coming from RO-R3/fO-f3 andone input coming from RO-R3/fO-f3 and
the other from R4-R7/f0-f7.the other from R4-R7/f0-f7.
-
7/30/2019 ch024
94/97
The ALU operands must come from R8-The ALU operands must come from R8-R15/f8-fl5, with one operand coming fromR15/f8-fl5, with one operand coming from
R8-Rll/f8-fll and the other from R12-R8-Rll/f8-fll and the other from R12-
R15/fl2-fl5.R15/fl2-fl5. performs three operations:performs three operations:
R6 = R0 * R4, R9 = R8 + R12, RI0 = R8 -R6 = R0 * R4, R9 = R8 + R12, RI0 = R8 -
R12R12
-
7/30/2019 ch024
95/97
-
7/30/2019 ch024
96/97
all CPUs are similarall CPUs are similar read and writeread and writememory, perform data operations, andmemory, perform data operations, and
make decisions.make decisions.
many ways to design an instruction set, asmany ways to design an instruction set, asillustrated by the differences between theillustrated by the differences between the
ARM and the SHARC.ARM and the SHARC.
-
7/30/2019 ch024
97/97
When designing complex systems, in high-When designing complex systems, in high-level language form, which hides many oflevel language form, which hides many of
the details of the instruction set.the details of the instruction set.
differences in instruction sets can bedifferences in instruction sets can bereflected in nonfunctional characteristics,reflected in nonfunctional characteristics,
such as program size and speed.such as program size and speed.