ch024

Upload: jeeva-priya

Post on 04-Apr-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/30/2019 ch024

    1/97

  • 7/30/2019 ch024

    2/97

    Why DSPWhy DSP

    a special class of microprocessors that area special class of microprocessors that are

    optimized for computing the real-timeoptimized for computing the real-time

    calculations used in signal processingcalculations used in signal processing

    DSPs have an architecture that simplifiesDSPs have an architecture that simplifies

    application designs and makes low-costapplication designs and makes low-cost

    signal processing a realitysignal processing a reality

  • 7/30/2019 ch024

    3/97

    characteristicscharacteristics

    fast, flexible computation unitsfast, flexible computation units

    unconstrained data flow to and from theunconstrained data flow to and from the

    computation unitscomputation units extended precision and dynamic range inextended precision and dynamic range in

    the computation unitsthe computation units

    dual address generatorsdual address generators efficient program sequencing and loopingefficient program sequencing and looping

    mechanismsmechanisms

  • 7/30/2019 ch024

    4/97

    SHARC family of DSPsSHARC family of DSPs

    Harvard architectureHarvard architecture

    one instructions per lineone instructions per line

    each instruction, end with with aeach instruction, end with with asemicolon (;)semicolon (;)

    A label, end with a colon (:)A label, end with a colon (:)

    Comments, start with an exclamationComments, start with an exclamationpoint (!)point (!)

  • 7/30/2019 ch024

    5/97

    Instructions exampleInstructions example

    R1 = DM(M0,I0), R2 = PM(M8,I8); ! aR1 = DM(M0,I0), R2 = PM(M8,I8); ! a

    commentcomment

    Label:Label:

    R3 = R1 + R2;R3 = R1 + R2;

  • 7/30/2019 ch024

    6/97

  • 7/30/2019 ch024

    7/97

    memorymemory

    SHARC uses different word sizes andSHARC uses different word sizes and

    address space sizes for instructions andaddress space sizes for instructions and

    datadata

    instruction consists of 48 bitsinstruction consists of 48 bits

    basic data word, 32 bitsbasic data word, 32 bits

    address, 32 bitsaddress, 32 bits

  • 7/30/2019 ch024

    8/97

    on-chip memoryon-chip memory

    the 21061, has smallest 1Mbit of on-chipthe 21061, has smallest 1Mbit of on-chip

    memorymemory

    internal memory:internal memory:

    (PM),(PM),

    (DM)(DM)

  • 7/30/2019 ch024

    9/97

    types of datatypes of data

    32-bit IEEE single-precision floating-point32-bit IEEE single-precision floating-point

    40-bit IEEE extended-precision floating-40-bit IEEE extended-precision floating-

    pointpoint

    32-bit integers32-bit integers

  • 7/30/2019 ch024

    10/97

    SHARC memorySHARC memory

    allows the program memory to hold bothallows the program memory to hold both

    data and instructionsdata and instructions

    allow extra data to be squeezed into theallow extra data to be squeezed into the

    on-chip memoryon-chip memory

    allows data to be fetched from bothallows data to be fetched from both

    memories in parallelmemories in parallel

  • 7/30/2019 ch024

    11/97

    SHARC memorySHARC memory

    The PM bus is used to access eitherThe PM bus is used to access either

    instructions or datainstructions or data

    During a single cycle the processor canDuring a single cycle the processor can

    access two data operands, one over theaccess two data operands, one over the

    PM bus and one over the DM busPM bus and one over the DM bus

  • 7/30/2019 ch024

    12/97

  • 7/30/2019 ch024

    13/97

    SHARC memorySHARC memory

    Each DAG keeps track of up to eightEach DAG keeps track of up to eight

    address pointers, eight modifiers and eightaddress pointers, eight modifiers and eight

    length valueslength values

    A pointer used for indirect addressing canA pointer used for indirect addressing can

    be modified by a value in a specifiedbe modified by a value in a specified

    registerregister

  • 7/30/2019 ch024

    14/97

  • 7/30/2019 ch024

    15/97

    SHARC programming modelSHARC programming model

    The primary data registers, r0-r15 or f0-The primary data registers, r0-r15 or f0-

    f15f15

    R0-R15: used for integer operationsR0-R15: used for integer operations

    F0-F15: used for floating-point operationsF0-F15: used for floating-point operations

    registers are 40 bits long for data typeregisters are 40 bits long for data type

    - 40-bit extended-precision floating-point- 40-bit extended-precision floating-pointvaluevalue

    - 32-bit data types, in most-significant bits- 32-bit data types, in most-significant bits

  • 7/30/2019 ch024

    16/97

  • 7/30/2019 ch024

    17/97

    CPUCPU

    CPU has three major data function units:CPU has three major data function units:

    an ALU, a multiplier, and a shifter.an ALU, a multiplier, and a shifter.

    three most-significant mode registers forthree most-significant mode registers for

    data operations:data operations:

    - arithmetic status- arithmetic status(ASTAT),(ASTAT),

    - sticky- sticky(STKY),(STKY),- mode 1- mode 1 (MODE1)(MODE1)

  • 7/30/2019 ch024

    18/97

    The ALU updates seven status flags in theThe ALU updates seven status flags in the

    ASTAT register at the end of eachASTAT register at the end of each

    operationoperation

    ALU also updates fourALU also updates four stickysticky status flagsstatus flags

    in the STKY register.in the STKY register.

    Once set, a sticky flag remains high untilOnce set, a sticky flag remains high until

    explicitly clearedexplicitly cleared

  • 7/30/2019 ch024

    19/97

    ASTATASTATBitBit NameName DefinitionDefinition

    00 AZAZ ALU result zero or floating-point underflowALU result zero or floating-point underflow

    11 AVAV ALU overflowALU overflow

    22

    ANAN

    ALU result negativeALU result negative

    33 ACAC ALU fixed-point carryALU fixed-point carry

    44 ASAS ALU X input sign (ABS, MANT operations)ALU X input sign (ABS, MANT operations)

    55 AIAI ALU floating-point invalid operationALU floating-point invalid operation1010 AFAF Last ALU operation was a floating-point operationLast ALU operation was a floating-point operation

    31-31-

    2424CACCCACC Compare Accumulation register (results of last 8Compare Accumulation register (results of last 8

    compare operations)compare operations)

  • 7/30/2019 ch024

    20/97

    STKYSTKY

    BitBit NameName DefinitionDefinition

    00 AUSAUS ALU floating-point underflowALU floating-point underflow

    11 AVSAVS ALU floating-point overflowALU floating-point overflow

    22 AOSAOS ALU fixed-point overflowALU fixed-point overflow

    55 AISAIS ALU floating-point invalid operationALU floating-point invalid operation

  • 7/30/2019 ch024

    21/97

    Rn, Rx,Rn, Rx, andand RyRyare arbitrary data registersare arbitrary data registers

    R0-R15R0-R15

    operations set various status bits in theoperations set various status bits in the

    ASTAT1 and STKY registersASTAT1 and STKY registers

    COMPCOMPcompares two values withoutcompares two values without

    modifying any data registersmodifying any data registers

  • 7/30/2019 ch024

    22/97

    Rn = Rx+RyRn = Rx+Ry

    Rn = Rx-RyRn = Rx-Ry

    Rn = Rx+Ry+CIRn = Rx+Ry+CI

    Rn = Rx-Ry+CI-lRn = Rx-Ry+CI-l

    Rn=(Rx + Ry)/2Rn=(Rx + Ry)/2

    COMP(Rx,Ry)COMP(Rx,Ry)

    AddAdd

    SubtractSubtract

    Add with carryAdd with carry

    Subtract with borrowSubtract with borrow

    AverageAverage

    CompareCompare

  • 7/30/2019 ch024

    23/97

    Rn = Rx + CIRn = Rx + CI

    Rn = Rx+CI-lRn = Rx+CI-l

    Rn = Rx+lRn = Rx+l

    Rn = Rx-lRn = Rx-l

    Rn = -RxRn = -Rx

    Rn = ABS RxRn = ABS Rx

    Rn = PASS RxRn = PASS Rx

    Add carryAdd carry

    Add borrowAdd borrow

    IncrementIncrement

    DecrementDecrement

    NegateNegate

    Absolute valueAbsolute value

    Copy Rx to RnCopy Rx to Rn

  • 7/30/2019 ch024

    24/97

    Rn = Rx AND RyRn = Rx AND Ry

    Rn = Rx OR RyRn = Rx OR Ry

    Rn = Rx XOR RyRn = Rx XOR Ry

    Rn = NOT RxRn = NOT Rx

    Rn = MIN(Rx,Ry)Rn = MIN(Rx,Ry)

    Rn = MAX(Rx,Ry)Rn = MAX(Rx,Ry)

    Rn = CLIP Rx by RyRn = CLIP Rx by Ry

    Logical ANDLogical AND

    Logical ORLogical OR

    Logical exclusive ORLogical exclusive OR

    Logical negateLogical negate

    Minimum of Rx, RyMinimum of Rx, Ry

    Maximum of Rx, RyMaximum of Rx, Ry

    Clip Rx within range [-Ry,Ry]Clip Rx within range [-Ry,Ry]

  • 7/30/2019 ch024

    25/97

    All the ALU operations set the AZ (ALU resultAll the ALU operations set the AZ (ALU resultzero), AN (ALU result negazero), AN (ALU result nega--tive), AV (ALU resulttive), AV (ALU resultoverflow), AC (ALU fixed-point carry), and AIoverflow), AC (ALU fixed-point carry), and AI(floating(floating--point invalid) bits in the ASTAT register.point invalid) bits in the ASTAT register.

    STKY register is a sticky version of ASTATSTKY register is a sticky version of ASTATregister.register.

    STKY bits are set along with the ASTAT registerSTKY bits are set along with the ASTAT register

    bits, but are not cleared.bits, but are not cleared. STKY bits always remain set until cleared by anSTKY bits always remain set until cleared by an

    instruction.instruction.

  • 7/30/2019 ch024

    26/97

    Saturation ModeSaturation Mode

    The SHARC can performThe SHARC can perform

    arithmetic on fixed-point values.arithmetic on fixed-point values.

    all positive fixed-point overflows cause theall positive fixed-point overflows cause the

    maximum positive fixed-point numbermaximum positive fixed-point number

    (0x7FFF FFFF) to be returned, and all(0x7FFF FFFF) to be returned, and all

    negative overflows cause the maximumnegative overflows cause the maximum

    negative number (0x8000 0000) to benegative number (0x8000 0000) to bereturnedreturned

  • 7/30/2019 ch024

    27/97

    Saturation ModeSaturation Mode

    In saturation arithmetic, an overflowIn saturation arithmetic, an overflow

    results in the maximum-range value, notresults in the maximum-range value, not

    the result of wrapping around the numericthe result of wrapping around the numeric

    range.range. Saturation mode is controlled by theSaturation mode is controlled by the

    ALUSAT bit in the MODE1 registerALUSAT bit in the MODE1 register

  • 7/30/2019 ch024

    28/97

    SHARC doesn't have a divide instructionSHARC doesn't have a divide instruction

    Iterative algorithms are used to computeIterative algorithms are used to compute

    both reciprocals and square roots.both reciprocals and square roots.

    TheThe RECIPSRECIPSandand RSQRTSRSQRTSoperations areoperations are

    used to start these iterative algorithmsused to start these iterative algorithms

  • 7/30/2019 ch024

    29/97

    Floating-Point Rounding ModesFloating-Point Rounding Modes

    If the TRUNC bit is set, the ALU rounds aIf the TRUNC bit is set, the ALU rounds a

    result to zero (truncation). If the TRUNCresult to zero (truncation). If the TRUNC

    bit is cleared, the ALU rounds to nearest.bit is cleared, the ALU rounds to nearest.

    The rounding modes used for floating-The rounding modes used for floating-

    point arithmetic are controlled by two bitspoint arithmetic are controlled by two bits

    in the MODE1 registerin the MODE1 register

  • 7/30/2019 ch024

    30/97

    Multiplication sets the MN (multiplier resultMultiplication sets the MN (multiplier result

    negative), MV (multiplier overnegative), MV (multiplier over--flow), MUflow), MU

    (multiplier floating-point underflow), and(multiplier floating-point underflow), and

    MI (multiplier floatingMI (multiplier floating--point invalidpoint invalidoperation) bits in the ASTAT register.operation) bits in the ASTAT register.

  • 7/30/2019 ch024

    31/97

    Fn = Fx + FyFn = Fx + Fy

    Fn = Fx-FyFn = Fx-Fy

    Fn = ABS(Fx + Fy)Fn = ABS(Fx + Fy)

    Fn = ABS(Fx-Fy)Fn = ABS(Fx-Fy)

    Fn=(Fx + Fy)/2Fn=(Fx + Fy)/2

    COMP(Fx,Fy)COMP(Fx,Fy)

    Fn = -FxFn = -Fx

    AddAdd

    SubtractSubtract

    Absolute value of sumAbsolute value of sum

    Absolute value of differenceAbsolute value of difference

    AverageAverage

    CompareCompare

    NegateNegate

  • 7/30/2019 ch024

    32/97

    Fn = ABSFxFn = ABSFx

    Fn = PASS FxFn = PASS Fx

    Fn = RND FxFn = RND Fx

    Fn = SCALE Fx by RyFn = SCALE Fx by RyRn = MANX FxRn = MANX Fx

    Rn = LOGB FxRn = LOGB Fx

    Rn = FIX Fx,Rn = FIX Fx,

    Rn = TRUNC FxRn = TRUNC Fx

    Fn = FLOAT Rx by RyFn = FLOAT Rx by Ry,,LOAT RxLOAT Rx

    Absolute valueAbsolute value

    CopyFxtoFnCopyFxtoFn

    RoundRound

    Scale exponent of Fx by RyScale exponent of Fx by RyExtract mantissa of FxExtract mantissa of Fx

    Convert exponent of Fx to integerConvert exponent of Fx to integer

    Convert floating-point to integerConvert floating-point to integer

    Convert integer to floating-pointConvert integer to floating-point

  • 7/30/2019 ch024

    33/97

    Fn = RECIPS FxFn = RECIPS Fx

    Fn = RSQRTS FxFn = RSQRTS Fx

    Fn = Fx COPYSIGN FyFn = Fx COPYSIGN FyFn = MIN(Fx.Fy)Fn = MIN(Fx.Fy)

    Fn = MAX(Fx,Fy)Fn = MAX(Fx,Fy)

    Fn = CLIPFxbyFyFn = CLIPFxbyFy

    Create seed for reciprocalCreate seed for reciprocal

    Create seed for reciprocal squareCreate seed for reciprocal square

    rootroot

    Copy sign of Fy to FxCopy sign of Fy to Fx

    Minimum of Fx, FyMinimum of Fx, Fy

    Maximum of Fx, FyMaximum of Fx, Fy

    Clip Fx within range [-Fy,Fy]Clip Fx within range [-Fy,Fy]

  • 7/30/2019 ch024

    34/97

    The multiplier performs fixed-point andThe multiplier performs fixed-point and

    floating-point multiplication.floating-point multiplication.

    perform saturation, rounding, and settingperform saturation, rounding, and setting

    the result to 0.the result to 0.

    Fixed-point multiplication produces an 80-Fixed-point multiplication produces an 80-

    bit resultbit result

  • 7/30/2019 ch024

    35/97

    Logical shifts fill with zeroes, whileLogical shifts fill with zeroes, while

    arithmetic shifts copy sign bits.arithmetic shifts copy sign bits.

    The distance to shift, supplied by theThe distance to shift, supplied by the RyRy

    register, may be positive for a left shift orregister, may be positive for a left shift or

    negative for a right shift.negative for a right shift.

    Shift operations set the SZ (shifter zero),Shift operations set the SZ (shifter zero),

    SV (shifter overflow), and SS (shifter inputSV (shifter overflow), and SS (shifter input

    sign) bits in the ASTAT register.sign) bits in the ASTAT register.

  • 7/30/2019 ch024

    36/97

    RnRn ==LSHIFT Rx by RyLSHIFT Rx by Ry

    Rn = Rn OR LSHIFT Rx by RyRn = Rn OR LSHIFT Rx by Ry

    Rn=ASHIFT Rx by RyRn=ASHIFT Rx by Ry

    Rn = Rn OR ASHIFT Rx byRn = Rn OR ASHIFT Rx by RyRy

    Rn = ROT Rx by RyRn = ROT Rx by RyRn = BCLR Rx by RyRn = BCLR Rx by Ry

    Rn = BSET Rx by RyRn = BSET Rx by Ry

    Rn = BTGL Rx by RyRn = BTGL Rx by Ry

    Logical shift distance RyLogical shift distance Ry

    Logical shift and logical ORLogical shift and logical OR

    Arithmetic shiftArithmetic shift

    Arithmetic shift and logical ORArithmetic shift and logical OR

    Rotate distance RyRotate distance RyClear one bit in RxClear one bit in Rx

    Set one bit in RxSet one bit in Rx

    Toggle one bit in RxToggle one bit in Rx

  • 7/30/2019 ch024

    37/97

  • 7/30/2019 ch024

    38/97

    Rn = EXP Rx (EX)Rn = EXP Rx (EX)

    RnRn ==LEFTZ RxLEFTZ Rx

    Rn = LEFTO RxRn = LEFTO Rx

    Rn = FPACK FxRn = FPACK Fx

    Fx = FUNPACK RnFx = FUNPACK Rn

    Extract exponent field from ALUExtract exponent field from ALU

    Extract number of leading OsExtract number of leading Os

    Extract number of leading IsExtract number of leading Is

    Convert 32-bit floating-point to 16-Convert 32-bit floating-point to 16-bit floating-pointbit floating-point

    Convert 16-bit floating-point to 32-Convert 16-bit floating-point to 32-

    bit floating-pointbit floating-point

  • 7/30/2019 ch024

    39/97

    Ex2-7 Data Operation Status Bits inEx2-7 Data Operation Status Bits in

    the SHARCthe SHARC fixed-point ALU calculation -1 + 1 = 0,fixed-point ALU calculation -1 + 1 = 0,

    ASTAT status bits are set: AZ = 1, AU = 0,ASTAT status bits are set: AZ = 1, AU = 0,

    AN = 0, AV = 0, AC = 1, and AI = 0.AN = 0, AV = 0, AC = 1, and AI = 0.

    floating-point operation -1EO+ 1EO =floating-point operation -1EO+ 1EO =

    0E0, AOS (ALU fixed-point underflow) will0E0, AOS (ALU fixed-point underflow) will

    be similarly set.be similarly set.

  • 7/30/2019 ch024

    40/97

    Ex2-7Data Operation Status Bits inEx2-7Data Operation Status Bits in

    the SHARCthe SHARC fixed-point multiplier operation -2 * 3,fixed-point multiplier operation -2 * 3,

    ASTAT bits are set as follows:ASTAT bits are set as follows:

    MN = 1, MV = 0, MU = 1, and MI = 0.MN = 1, MV = 0, MU = 1, and MI = 0.

    multiplier has four STKY bits, none will be setmultiplier has four STKY bits, none will be set

    MOS (multiplier fixed-point overMOS (multiplier fixed-point over--flow),flow),

    MVS (multiplier floating-point overflow),MVS (multiplier floating-point overflow),

    MUS (multiplier floating-point underflow),MUS (multiplier floating-point underflow), MIS (multiplier floating-point invalid operation).MIS (multiplier floating-point invalid operation).

  • 7/30/2019 ch024

    41/97

    Ex2-7Data Operation Status Bits inEx2-7Data Operation Status Bits in

    the SHARCthe SHARC

    For the following shifter operation,For the following shifter operation,

    LSHIFT Ox7fffffff BY 3LSHIFT Ox7fffffff BY 3

    ASTAT bits will be set as follows:ASTAT bits will be set as follows: SZ = 0, SV = 1, and SS = 0.SZ = 0, SV = 1, and SS = 0.

    The shifter has no sticky bits.The shifter has no sticky bits.

  • 7/30/2019 ch024

    42/97

    operands must be loaded intooperands must be loaded into

    registers before operating on them.registers before operating on them.

    SHARC supplies special registers that areSHARC supplies special registers that are

    used to control loading and storing.used to control loading and storing.

    SHARC has twoSHARC has two

    ne for the data memory and thene for the data memory and the

    other for the program memory.other for the program memory.

  • 7/30/2019 ch024

    43/97

    DAGsDAGs

    Data address generator 1 (DAG1)Data address generator 1 (DAG1)generates 32-bit addresses on the DMgenerates 32-bit addresses on the DM

    Address BusAddress Bus

    Data address generator 2 (DAG2)Data address generator 2 (DAG2)generates 24-bit addresses on the PMgenerates 24-bit addresses on the PM

    Address BusAddress Bus

    Each DAG has four types of registers:Each DAG has four types of registers:Index (I), Modify (M), Base (B), andIndex (I), Modify (M), Base (B), andLength (L) registersLength (L) registers

  • 7/30/2019 ch024

    44/97

    DAGsDAGs

    I register acts as a pointer to memoryI register acts as a pointer to memory M register contains the increment valueM register contains the increment value

    for advancing the pointer.for advancing the pointer.

    B registers and L registers are used onlyB registers and L registers are used onlyfor circular data buffers.for circular data buffers.

    B register holds the base address (i.e. theB register holds the base address (i.e. the

    first address) of a circular buffer.first address) of a circular buffer. L register contains the number of locationsL register contains the number of locations

    in (i.e. the length of) the circular buffer.in (i.e. the length of) the circular buffer.

  • 7/30/2019 ch024

    45/97

    DAGsDAGs

    two DAGs, the SHARC can perform twotwo DAGs, the SHARC can perform two

    load-store operations per cycle.load-store operations per cycle.

    DAG hardware automatically updates theirDAG hardware automatically updates their

    values so that a series of accesses can bevalues so that a series of accesses can be

    very easily performed.very easily performed.

    DAGs quite useful for the sequentialDAGs quite useful for the sequential

    accessesaccesses

  • 7/30/2019 ch024

    46/97

    DAGsDAGs

    Each data address generator has eightEach data address generator has eight

    sets of primary registers.sets of primary registers.

    Having several sets allows for quickerHaving several sets allows for quicker

    access of multiple sets of dataaccess of multiple sets of data

    The registers numbered 0 through 7The registers numbered 0 through 7

    belong to DAG1, while registers 8 throughbelong to DAG1, while registers 8 through

    15 belong to DAG2.15 belong to DAG2.

  • 7/30/2019 ch024

    47/97

  • 7/30/2019 ch024

    48/97

  • 7/30/2019 ch024

    49/97

    DAGsDAGs

    DAGs provide the following addressingDAGs provide the following addressingmodesmodes

    immediate valueimmediate value

    R0 = DM (0x2000000);R0 = DM (0x2000000); R0 = DM(_a);R0 = DM(_a); loads R0 the contents of the variable aloads R0 the contents of the variable a

    DM(_a) = R0;DM(_a) = R0; stores R0 into memory locationstores R0 into memory location

  • 7/30/2019 ch024

    50/97

    DAGsDAGs

    has the entire address in the instructionhas the entire address in the instruction

    address bits take up most of theaddress bits take up most of theinstruction, 32bits/40bitsinstruction, 32bits/40bits

  • 7/30/2019 ch024

    51/97

    modemode

    sweep through a range of addressessweep through a range of addresses

    uses an I register and a modifier, Muses an I register and a modifier, M

    register or an immediate value.register or an immediate value.

    I register specifies the address, updatedI register specifies the address, updated

    by the modifier valueby the modifier value

    R0 = DM(I3,M1)R0 = DM(I3,M1)

    DM(I2,1) = R1DM(I2,1) = R1

  • 7/30/2019 ch024

    52/97

    addressingaddressing

    address of the location to be fetched isaddress of the location to be fetched is

    computed as I + M, where I is the basecomputed as I + M, where I is the base

    and M is the modifier or offsetand M is the modifier or offset

    I0 = 0x2000000 and Ml = 4,I0 = 0x2000000 and Ml = 4,

    R0 = DM(M1,I0)R0 = DM(M1,I0)

    load DM(0x2000004) into R0load DM(0x2000004) into R0

  • 7/30/2019 ch024

    53/97

    A circular buffer is an array ofA circular buffer is an array ofnnelements; whenelements; when

    thethe n +n +1th element is referenced, the reference1th element is referenced, the reference

    goes to buffer location 0, wrapping around fromgoes to buffer location 0, wrapping around from

    the end to the beginning of the buffer.the end to the beginning of the buffer. L register is set with a positive, nonzero value asL register is set with a positive, nonzero value as

    the starting point in the circular buffer,the starting point in the circular buffer,

    B register of the same number is loaded with theB register of the same number is loaded with the

    base address of the circular buffer.base address of the circular buffer.

  • 7/30/2019 ch024

    54/97

    fast Fourier transform (FFT)fast Fourier transform (FFT)

    Bit-reversal addressing can be performedBit-reversal addressing can be performed

    only in I0 and I8, as controlled by the BR0only in I0 and I8, as controlled by the BR0

    and BR8 bits in the MODE1 register.and BR8 bits in the MODE1 register.

  • 7/30/2019 ch024

    55/97

    allows data to be stored in theallows data to be stored in the

    program memoryprogram memory

    allows two data fetches per cycleallows two data fetches per cycle

    F0 = DM(M0,I0), F1 = PM(M8,I9)F0 = DM(M0,I0), F1 = PM(M8,I9)

    simultaneously load F0 from data memorysimultaneously load F0 from data memory

    and F1 from program memoryand F1 from program memory

  • 7/30/2019 ch024

    56/97

    float dm a[N];float dm a[N];

    float pm b[N];float pm b[N];

    will place the a[] array in data memorywill place the a[] array in data memory

    and b[] in program memoryand b[] in program memory

    E 2 8 C A i t i SHARCEx2 8 C Assignments in SHARC

  • 7/30/2019 ch024

    57/97

    Ex2-8 C Assignments in SHARCEx2-8 C Assignments in SHARC

    InstructionsInstructions x = (a + b) - c;x = (a + b) - c;

    r0 for a, r1 for b, r2 for c, and r3 for xr0 for a, r1 for b, r2 for c, and r3 for x

    R0 = DM(_a); ! get value of aR0 = DM(_a); ! get value of a

    R1 = DM(_b); ! load value of bR1 = DM(_b); ! load value of b R3 = R0 + R1; ! set result for x to a + bR3 = R0 + R1; ! set result for x to a + b

    R2 = DM(_c) ; ! get value of cR2 = DM(_c) ; ! get value of c

    SUB R3 = R3 - R2 ; ! complete computation of xSUB R3 = R3 - R2 ; ! complete computation of x DM(_x) = R3 ; ! store x at proper locationDM(_x) = R3 ; ! store x at proper location

    E 2 8 C A i t i SHARCEx2 8 C Assignments in SHARC

  • 7/30/2019 ch024

    58/97

    Ex2-8 C Assignments in SHARCEx2-8 C Assignments in SHARC

    InstructionsInstructions y = a*(b + c);y = a*(b + c);

    use r0 for a, r1 for b, and r2 for both c and yuse r0 for a, r1 for b, and r2 for both c and y

    R1 = DM(_b); ! load bR1 = DM(_b); ! load b

    R2 = DM(_c); ! load cR2 = DM(_c); ! load c R2 = R1 + R2 ; ! compute partial result for yR2 = R1 + R2 ; ! compute partial result for y

    R0 = DM(_a); ! load aR0 = DM(_a); ! load a

    R2 = R2 * r0 ; ! compute final value of yR2 = R2 * r0 ; ! compute final value of y DM(_y) = R2 ; ! store yDM(_y) = R2 ; ! store y

  • 7/30/2019 ch024

    59/97

    Ex2 8 C Assignments in SHARCEx2 8 C Assignments in SHARC

  • 7/30/2019 ch024

    60/97

    Ex2-8 C Assignments in SHARCEx2-8 C Assignments in SHARC

    InstructionsInstructions z = (az = (a2) | (b & 15);2) | (b & 15); r0 for a and z, r1 for b, and r3 to hold the bit mask to ber0 for a and z, r1 for b, and r3 to hold the bit mask to be

    ANDedANDed R0 = DM(_a) ; ! get value of aR0 = DM(_a) ; ! get value of a

    R0 = LSHIFT R0 BY #2 ; ! perform shiftR0 = LSHIFT R0 BY #2 ; ! perform shift R1 = DM(_b) ; ! get value of bR1 = DM(_b) ; ! get value of b R3 = #15 ; ! set up the bit mask forR3 = #15 ; ! set up the bit mask for

    ANDingANDing

    R1 = R1 AND R3 ; ! perform logical ANDR1 = R1 AND R3 ; ! perform logical AND R0 = R1 OR R0 ; ! compute final value of zR0 = R1 OR R0 ; ! compute final value of z DM(_z) = R0 ; ! store value of zDM(_z) = R0 ; ! store value of z

  • 7/30/2019 ch024

    61/97

  • 7/30/2019 ch024

    62/97

    JUMP instructionJUMP instruction

    jumps to the location foojumps to the location foo

    - JUMP foo- JUMP foo

    Direct:Direct:

    specifies a 24-bit address inspecifies a 24-bit address in

    immediateimmediate

    Indirect: supplyIndirect: supplyby DAG2 data addressby DAG2 data address

    generator.generator.

    PC-relative:PC-relative:specifies an immediate valuespecifies an immediate value

    that is added to the current PC.that is added to the current PC.

  • 7/30/2019 ch024

    63/97

    loop instructionloop instruction

    LCNTR = n, DO Label UNTIL LCE;LCNTR = n, DO Label UNTIL LCE;

    loop instruction specifies the following:loop instruction specifies the following:

    - length of the loop, loop counter LCNTR- length of the loop, loop counter LCNTR

    - Label, the address for the last instruction- Label, the address for the last instruction

    in the loopin the loop

    - loop termination condition LCE, which- loop termination condition LCE, whichstands for "loop counter expired"stands for "loop counter expired"

  • 7/30/2019 ch024

    64/97

    True versionTrue version

    EQEQ

    LTLT

    LELE

    ACAC

    AVAV

    DescriptionDescription

    ALU = 0ALU = 0

    ALU

  • 7/30/2019 ch024

    65/97

    MVMV

    MSMS

    SVSV

    SZSZFLAGO_INFLAGO_IN

    Multiplier overflowMultiplier overflow

    Multiplier signMultiplier sign

    Shifter overflowShifter overflow

    Shifter zeroShifter zeroFlag 0 inputFlag 0 input

    NOT MVNOT MV

    NOT MSNOT MS

    NOT SVNOT SV

    NOT SZNOT SZNOT FLAGO_INNOT FLAGO_IN

  • 7/30/2019 ch024

    66/97

    FLAG1_INFLAG1_IN

    FLAG2_INFLAG2_IN

    FLAG3_INFLAG3_IN

    TFTFLCELCE

    NOT LCENOT LCE

    Flag 1 inputFlag 1 input

    Flag 2 inputFlag 2 input

    Flag 3 inputFlag 3 input

    Bit test flagBit test flagLoop counter expiredLoop counter expired

    Loop counter notLoop counter not

    expiredexpired

    NOT FLAG1_INNOT FLAG1_IN

    NOT FLAG2_INNOT FLAG2_IN

    NOT FLAG3_INNOT FLAG3_IN

    NOT TFNOT TF

  • 7/30/2019 ch024

    67/97

    Ex2-9 if statementEx2-9 if statement

    if (a > b) {if (a > b) {

    x = 5;x = 5;

    y = c + d;y = c + d;

    }}

    else x = c - d;else x = c - d;

  • 7/30/2019 ch024

    68/97

    Ex2-9 if statementEx2-9 if statement

    !test!test

    R0 = DM(_a);R0 = DM(_a); ! load a! load a

    R1 = DM(_b);R1 = DM(_b); ! load b! load b

    COMP(R0,R1)COMP(R0,R1) ! Compare a,b! Compare a,b

    IF GE JUMP fbock; ! jump if fails testIF GE JUMP fbock; ! jump if fails test

    ! true block! true block

  • 7/30/2019 ch024

    69/97

    Ex2-9 if statementEx2-9 if statement

    tblock:tblock: R0 = 5;R0 = 5; ! get value for x! get value for x

    DM(_x) = R0;DM(_x) = R0; ! store value for x! store value for x

    R0 = DM(_c);R0 = DM(_c); ! get c! get c

    R1 = DM(_d);R1 = DM(_d); ! getd! getd

    R1 = R0 + R1;R1 = R0 + R1; !compute c + d!compute c + d

    DM(_y) = R1;DM(_y) = R1; ! save value for y! save value for y JUMP other;JUMP other; ! skip false block! skip false block

  • 7/30/2019 ch024

    70/97

    an example Ex2-9 if statementan example Ex2-9 if statement

    ! false block! false block

    fblock: R0 = DM(_c); ! get cfblock: R0 = DM(_c); ! get c

    R1 = DM(_d);R1 = DM(_d); ! get d! get d

    R1 = R0 - R1;R1 = R0 - R1; ! compute c - d! compute c - d

    DM(_x) = Rl;DM(_x) = Rl; ! save value for x! save value for x

    other: ... ! code after ifother: ... ! code after if

  • 7/30/2019 ch024

    71/97

    Ex2-9 if statementEx2-9 if statement

    if (a > b)if (a > b)

    y = c - d;y = c - d;

    elseelse

    y = c + d;y = c + d;

  • 7/30/2019 ch024

    72/97

    Ex2-9 if statementEx2-9 if statement

    ! load values! load values

    R1 = DM(_a);R1 = DM(_a); ! load a! load a

    R8 = DM(_b);R8 = DM(_b); ! load b! load b

    R2 = DM(_c);R2 = DM(_c); ! load c! load c

    R4 = DM(_d); ! load dR4 = DM(_d); ! load d

    ! compute both sum and difference! compute both sum and difference

  • 7/30/2019 ch024

    73/97

    Ex2-9 if statementEx2-9 if statement

    r12 = r2 + r4, r0 = r2 - r4;r12 = r2 + r4, r0 = r2 - r4;

    ! choose which one to save, copy it into r0! choose which one to save, copy it into r0

    if necessary, then write to yif necessary, then write to y

    comp(r8,rl); ! Compare b,acomp(r8,rl); ! Compare b,a

    if ge r0 = r12; ! a

  • 7/30/2019 ch024

    74/97

    When control reaches the last instructionWhen control reaches the last instruction

    in the loop, the machine immediatelyin the loop, the machine immediately

    returns to the head of the loop unless thereturns to the head of the loop unless the

    loop counter has expired.loop counter has expired. zero-overhead loop: because the jumpzero-overhead loop: because the jump

    back to the top of the loop (andback to the top of the loop (and

    associated delays) are avoided.associated delays) are avoided.

  • 7/30/2019 ch024

    75/97

    loop instruction: use two stacks to handleloop instruction: use two stacks to handle

    nested loops (one loop contained insidenested loops (one loop contained inside

    another).another).

    The PC is in fact a stack; a separate stackThe PC is in fact a stack; a separate stackholds the loop counters for all activeholds the loop counters for all active

    loops.loops.

    PC stack is 30 deep, holds subroutinePC stack is 30 deep, holds subroutinereturn addresses, loop addresses, loopreturn addresses, loop addresses, loop

    counter stack is 6 deep.counter stack is 6 deep.

  • 7/30/2019 ch024

    76/97

    When the DO UNTIL is first encountered,When the DO UNTIL is first encountered,

    - loop end address pushed onto PC stack- loop end address pushed onto PC stack

    - new loop counter value pushed onto the- new loop counter value pushed onto the

    loop counter stack.loop counter stack.

    reaches the loop end address,reaches the loop end address,

    - CPU automatically decrements the loop- CPU automatically decrements the loopcounter and checks its value.counter and checks its value.

  • 7/30/2019 ch024

    77/97

    If the termination condition (which may beIf the termination condition (which may beLCE or NOT LCE) is not satisfied, the PC isLCE or NOT LCE) is not satisfied, the PC is

    set to the instruction just after the DOset to the instruction just after the DO

    UNTIL for another iteration.UNTIL for another iteration. If the condition is satisfied, the two stacksIf the condition is satisfied, the two stacks

    are popped and execution continues at theare popped and execution continues at the

    instruction after the loop end address.instruction after the loop end address.

    l

  • 7/30/2019 ch024

    78/97

    looploop

    for (i = 0, f = 0; i < N; i++)for (i = 0, f = 0; i < N; i++)

    f = f + c[i] * x[i];f = f + c[i] * x[i];

    ! loop setup! loop setup

    I0 = _a; ! I0 points to a[0]I0 = _a; ! I0 points to a[0]

    M0 = 1; ! set up incrementM0 = 1; ! set up increment

    I8 = b; ! I8 points to b[0]I8 = b; ! I8 points to b[0] M8 = 1; ! set up postincrement modeM8 = 1; ! set up postincrement mode

    l

  • 7/30/2019 ch024

    79/97

    looploop

    ! loop body! loop body

    LCNTR = N, DO loopend UNTIL LCE;LCNTR = N, DO loopend UNTIL LCE;

    ! use postincrement mode! use postincrement mode

    R1 = DM(I0,M0), R2 = PM(I8,M8);R1 = DM(I0,M0), R2 = PM(I8,M8);

    loopend: R8 = R1*R2, R12 = R12 + R9; !loopend: R8 = R1*R2, R12 = R12 + R9; !

    multiply and accumulatemultiply and accumulate

    l

  • 7/30/2019 ch024

    80/97

    looploop

    optimized:optimized:

    ! loop setup! loop setup

    I4 = _a; ! load aI4 = _a; ! load a

    I12 = _b; ! load bI12 = _b; ! load b

    R4 = R4 xor R4, R1 = DM(I4,M6), R2 =R4 = R4 xor R4, R1 = DM(I4,M6), R2 =

    PM(I12,M14);PM(I12,M14); MR0F=R4, MODIFY(I7,M7);MR0F=R4, MODIFY(I7,M7);

    ll

  • 7/30/2019 ch024

    81/97

    looploop

    ! start loop! start loop

    LCNTR = 20, DO(PC,loop) UNTIL LCE;LCNTR = 20, DO(PC,loop) UNTIL LCE;

    loop: MRF = MRF + R2*R1 (SSI), R1 =loop: MRF = MRF + R2*R1 (SSI), R1 =

    DM(I4,M6), R2 = PM(I12,M14);DM(I4,M6), R2 = PM(I12,M14);

    ! loop clean-up! loop clean-up

    R0 = MR0F;R0 = MR0F;

  • 7/30/2019 ch024

    82/97

    procedure calls,procedure calls, CALL foo;CALL foo;

    executed conditionallyexecuted conditionally

    IF GT CALL (PC,100);IF GT CALL (PC,100); a PC-relative call to a point 100 locations pasta PC-relative call to a point 100 locations past

    the curthe cur--rent PC value.rent PC value.

    CALL instruction pushes current PC value plus 1CALL instruction pushes current PC value plus 1onto PC stack before to target address.onto PC stack before to target address.

  • 7/30/2019 ch024

    83/97

  • 7/30/2019 ch024

    84/97

    void f1(int a) {void f1(int a) {

    f2(a);f2(a);

    }}

    SHARC has a PC stack, do not need toSHARC has a PC stack, do not need to

    push the return address, only thepush the return address, only the

    registers.registers.

    SHARC does not have general-purposeSHARC does not have general-purpose

    stack operators, use the DAGs tostack operators, use the DAGs to

    implement a stack with a little effort.implement a stack with a little effort.

  • 7/30/2019 ch024

    85/97

    Pushing stack isPushing stack is use postincrementuse postincrementmode, I register automatically points tomode, I register automatically points to

    the empty location at the top of the stack.the empty location at the top of the stack.

    Reading values off the stack requiresReading values off the stack requiresspecifying a constant offset in the M fieldspecifying a constant offset in the M field

    to provide the distance from the end ofto provide the distance from the end of

    the stack frame to the variable. Poppingthe stack frame to the variable. Poppingthe stack means modifying the I register.the stack means modifying the I register.

  • 7/30/2019 ch024

    86/97

    use I1 to point to the stack and weuse I1 to point to the stack and weassume that Ml has been set to 1, theassume that Ml has been set to 1, the

    stack push increment, at the start of thestack push increment, at the start of the

    program. Here is handwritten code for fl(),program. Here is handwritten code for fl(),which includes a call to f2():which includes a call to f2():

  • 7/30/2019 ch024

    87/97

    fl:fl: R0 = DM(I1,-1);R0 = DM(I1,-1); ! load argument a into R0! load argument a into R0from stackfrom stack

    ! call f2()! call f2()

    DM(I1,M1) = R0;DM(I1,M1) = R0; ! push f2's argument onto! push f2's argument ontothe stackthe stack

    CALL f2;CALL f2; ! call f2! call f2

    ; return from fl(); return from fl() MODIFY(I1,-1);MODIFY(I1,-1); ! pop one element off stack! pop one element off stack

    RTS;RTS; ! return! return

  • 7/30/2019 ch024

    88/97

  • 7/30/2019 ch024

    89/97

    SHARC to allow operations to performeSHARC to allow operations to performesimultaneously.simultaneously.

    many machines offer parallel execution,many machines offer parallel execution,

    but hidden from the programmer.but hidden from the programmer. The SHARC's wide instruction word allowsThe SHARC's wide instruction word allows

    the programmer to put together parallelthe programmer to put together parallel

    operationsoperations

  • 7/30/2019 ch024

    90/97

    The machine supports both memoryThe machine supports both memoryparallelism and operation parallelism.parallelism and operation parallelism.

    reduce the number of instructionsreduce the number of instructions

    required for common operations.required for common operations. For example, the basic operation in a dotFor example, the basic operation in a dot

    product loop can be performed in oneproduct loop can be performed in one

    cycle that performs two fetches, acycle that performs two fetches, amultiplication, and an addition.multiplication, and an addition.

  • 7/30/2019 ch024

    91/97

    The modified Harvard architecture allowsThe modified Harvard architecture allowsmultiple data fetches in a singlemultiple data fetches in a singleinstruction.instruction.

    The most common instructions allow aThe most common instructions allow amemory reference and a computation tomemory reference and a computation tobe performed at the same time.be performed at the same time.

    Memory references can be done two at aMemory references can be done two at atime in many instructions, with eachtime in many instructions, with eachreference using a DAG.reference using a DAG.

  • 7/30/2019 ch024

    92/97

    instruction set allows the CPU's functioninstruction set allows the CPU's functionunits to be performed in a singleunits to be performed in a single

    instructioninstruction

    fixed-point multiply-accumulate and add,fixed-point multiply-accumulate and add,subtract, or average;subtract, or average;

    floating-point multiplication and ALUfloating-point multiplication and ALU

    operation; andoperation; and multiplication and dual add-subtract.multiplication and dual add-subtract.

  • 7/30/2019 ch024

    93/97

    restrictions on the sources of the operandsrestrictions on the sources of the operandswhen operations are combined.when operations are combined.

    The operands going to the multiplier mustThe operands going to the multiplier must

    come from R0 through R7 (or in the casecome from R0 through R7 (or in the caseof floating-point operands, f0 to f7), withof floating-point operands, f0 to f7), with

    one input coming from RO-R3/fO-f3 andone input coming from RO-R3/fO-f3 and

    the other from R4-R7/f0-f7.the other from R4-R7/f0-f7.

  • 7/30/2019 ch024

    94/97

    The ALU operands must come from R8-The ALU operands must come from R8-R15/f8-fl5, with one operand coming fromR15/f8-fl5, with one operand coming from

    R8-Rll/f8-fll and the other from R12-R8-Rll/f8-fll and the other from R12-

    R15/fl2-fl5.R15/fl2-fl5. performs three operations:performs three operations:

    R6 = R0 * R4, R9 = R8 + R12, RI0 = R8 -R6 = R0 * R4, R9 = R8 + R12, RI0 = R8 -

    R12R12

  • 7/30/2019 ch024

    95/97

  • 7/30/2019 ch024

    96/97

    all CPUs are similarall CPUs are similar read and writeread and writememory, perform data operations, andmemory, perform data operations, and

    make decisions.make decisions.

    many ways to design an instruction set, asmany ways to design an instruction set, asillustrated by the differences between theillustrated by the differences between the

    ARM and the SHARC.ARM and the SHARC.

  • 7/30/2019 ch024

    97/97

    When designing complex systems, in high-When designing complex systems, in high-level language form, which hides many oflevel language form, which hides many of

    the details of the instruction set.the details of the instruction set.

    differences in instruction sets can bedifferences in instruction sets can bereflected in nonfunctional characteristics,reflected in nonfunctional characteristics,

    such as program size and speed.such as program size and speed.