pipelined mips porcessor

Upload: ma7ashish

Post on 05-Apr-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/31/2019 Pipelined MIPS Porcessor

    1/48

    Erik Jonsson School of En ineerin andComputer Science

    e n vers y o exas a a as

    The Pipelined MIPS Processor

    The Pipelined MIPS Processor

    We complete our study of computer architecture by investigating anapproach providing even higher performance for the MIPS CPU.

    We first saw how the MIPS CPU erformance could be im roved

    by converting the so-called single-cycle CPU to a multi-cycle design.

    In the multi-cycle approach, instead of using a single clock cycle for the

    whole instruction, the clock is accelerated, and instructions execute in

    phases over several clock cycles.

    Each instruction phase takes one clock cycle.

    This means that as each instruction executes, only one section of the

    CPU will be active per clock cycle -- the one executing that phase of theinstruction.

    This suggests that perhaps we might redesign the CPU slightly so

    N. B. Dodge 01/121 Lecture #21: The Pipeline MIPS Processor

    a every sec on can opera e n epen en y on an ns ruc on

    at the same time.

  • 7/31/2019 Pipelined MIPS Porcessor

    2/48

    Erik Jonsson School of En ineerin andComputer Science

    e n vers y o exas a a as

    The Laundry ExampleThe Laundry Example

    As an introduction to the concept of pipelining, Patterson andHennessy use the example of doing ones laundry.

    Most eo le have or have access to a washer and dr er.

    Assume that you need to wash several washer loads of clothing.

    Would anyone divide the clothing into washer loads and then, ,

    second?

    No, if you were washing clothes, you would finish washing the first

    , , . If there were more loads to wash, you would begin to fold and put

    away finished clothing while the later loads were washing and

    N. B. Dodge 01/122 Lecture #21: The Pipeline MIPS Processor

    .

    We can see this schematically on the next slide.

  • 7/31/2019 Pipelined MIPS Porcessor

    3/48

    Erik Jonsson School of En ineerin andComputer Science

    e n vers y o exas a a as

    Graphical Example of the Laundry CycleGraphical Example of the Laundry Cycle

    N. B. Dodge 01/123 Lecture #21: The Pipeline MIPS Processor

  • 7/31/2019 Pipelined MIPS Porcessor

    4/48

    Erik Jonsson School of En ineerin andComputer Science

    e n vers y o exas a a as

    The Pipeline ProcessorThe Pipeline Processor

    Patterson and Hennessy applied this simultaneous wash-dry-fold-put away concept to the single-cycle computer model.

    , , ,

    simultaneously so that the instruction throughput the number of

    clock cycles per instructions could be dramatically decreased.,

    cycle, but the clock must be as slow as the slowest instruction.

    In the multi-cycle implementation, the clock runs faster, instructions

    - , . What if, each time the clock ticked, we could process an instruction

    in each section of the multicycle processor? Then we could process

    N. B. Dodge 01/124 Lecture #21: The Pipeline MIPS Processor

    ,

    completing an instruction every clock cycle.

  • 7/31/2019 Pipelined MIPS Porcessor

    5/48

    Erik Jonsson School of En ineerin andComputer Science

    e n vers y o exas a a as

    Pipeline ArchitecturePipeline Architecture

    A pipelined computer executes instructions concurrently.

    Hardware units are organized into stages:

    Execution in each stage takes exactly 1 clock period.

    partial results to the next stage.

    Unfortunately, as noted earlier, speed = complexity + cost.

    e p pe ne approac r ngs a ona expense p us sown set of problems and complications, called hazards,

    which we will also study.

    N. B. Dodge 01/125 Lecture #21: The Pipeline MIPS Processor

  • 7/31/2019 Pipelined MIPS Porcessor

    6/48

    Erik Jonsson School of En ineerin andComputer Science

    e n vers y o exas a a as

    Sequential Versus Pipelined ExecutionSequential Versus Pipelined Execution

    (clock

    cycles) 0 1 2 3 4 5 6 7 8 9 10

    lw $t0, 16($a3) Instruc.Fetch

    Reg.

    Fetch

    ALU

    Process

    Mem. R/W

    or ALU Out

    Reg.

    Write

    lw $t1, 32($a3)

    lw $t2, 48($a3)

    Instruc.

    Fetch

    Reg.

    Fetch

    ALU

    Process

    Mem. R/W

    or ALU Out

    Reg.

    Write

    Instruc.

    Fetch

    Reg.

    Fetch

    AL

    Proc

    4 clock cycles

    4 clock cycles

    etc.Timeline

    (clock

    cycles) 0 1 2 3 4 5 6 7 8 9 105 clock cycles

    lw $t0, 16($a3)

    lw $t1, 32($a3)

    Instruc.Fetch

    Reg.Fetch

    ALUProcess

    Mem. R/Wor ALU Out

    Reg.Write

    Instruc.

    Fetch

    Reg.

    Fetch

    ALU

    Process

    Mem. R/W

    or ALU Out

    Reg.

    Write

    Instruc. Reg. ALU Mem. R/W Reg.

    N. B. Dodge 01/126 Lecture #21: The Pipeline MIPS Processor

    lw $t2, 48($a3)etc.

    e c e c rocess or u r e

  • 7/31/2019 Pipelined MIPS Porcessor

    7/48

    Erik Jonsson School of En ineerin andComputer Science

    e n vers y o exas a a as

    Speed Advantage of the PipelineSpeed Advantage of the Pipeline

    e mu t cyc e, ser a processor t at we stu e ast ecture can

    execute n instructions in ns clock periods, or ETS= ns, where

    A pipelined processor with s stages can execute n instructions in

    ET = s + n

    1 clock eriods. The ideal pipeline speedup depends on the number of stages, and

    can be greater for more stages (hence Intels choice of a 20-stage

    i eline for the current P-IV .

    Thus the speed advantage of pipeline over multicycle can be

    defined as:sET nsS s

    N. B. Dodge 01/127 Lecture #21: The Pipeline MIPS Processor

    Ps n

  • 7/31/2019 Pipelined MIPS Porcessor

    8/48

    Erik Jonsson School of En ineerin andComputer Science

    e n vers y o exas a a as

    Pipeline StagesPipeline Stages

    0 1 2 3 4 5

    ID/

    oc cyc es

    The MIPS R2000 pipeline processor is divided into five processing

    RF

    stages:

    1. Instruction fetch (IF)

    2. Instruction decode (ID) and register fetch (RF)

    3. ALU instruction execution (ALU) ALU processing, branchcondition evaluation, memory address computation, etc. This is also

    referred to as execution (EX)

    N. B. Dodge 01/128 Lecture #21: The Pipeline MIPS Processor

    . emory access

    5. Write back (WB) to register file

  • 7/31/2019 Pipelined MIPS Porcessor

    9/48

    Erik Jonsson School of En ineerin andComputer Science

    e n vers y o exas a a as

    Overlapped Pipeline ExecutionOverlapped Pipeline Execution

    0 1 2 3 4 5 6 7

    Clock cycles

    ALUIFID/

    RFMEM WB Instruction 1

    ALUIFID/

    RFMEM WB

    ID/

    Instruction 2

    RF

    N. B. Dodge 01/129 Lecture #21: The Pipeline MIPS Processor

    Instruction execution order

  • 7/31/2019 Pipelined MIPS Porcessor

    10/48

    Erik Jonsson School of En ineerin andComputer Science

    e n vers y o exas a a as

    Single-Cycle DatapathSingle-Cycle Datapath

    ADDBranch

    Mem. Read

    32

    32

    32

    32Reg. Dest.

    M

    U

    X

    ADDMem. To Reg.ALU Op.

    Reg. Write

    ALU Srce.

    +

    Left

    shift

    2

    s0-31

    32

    5

    ControlMem. Write

    6 (Bits 26-31)

    ALUPInstruction

    AddressM

    Data

    AddressInst.

    M

    s

    Rt

    Rd

    Read

    Data 1

    Read

    Data 2Read

    Instructionbit

    32

    32

    3232

    5

    5

    M

    U

    X

    5

    ReadWrite

    Mem./Reg.

    Select

    Lines indicate need for

    UX

    -

    UXReg. Block

    WriteData

    Sign

    Extend

    32

    Write

    Data

    Data

    16 (Bits 0-15)

    32

    32

    ALU

    Instruction

    Memory

    Data

    Memory

    N. B. Dodge 01/1210 Lecture #21: The Pipeline MIPS Processor

    storage between stages if

    processor is converted to

    pipeline

    6 (Bits 0-5)

  • 7/31/2019 Pipelined MIPS Porcessor

    11/48

    Erik Jonsson School of En ineerin andComputer Science

    e n vers y o exas a a as

    Single-Cycle Datapath with Pipeline RegistersSingle-Cycle Datapath with Pipeline Registers

    MU

    X

    Inter-stage registers are master-slave D flip-flops; the master can

    be receiving new data from the previous stage of the instructionwhile the slave flip-flop is providing data to the next stage

    ADD

    ADD+4

    Memory

    Left

    shift

    2

    Compare

    resultReg. Block

    ALU

    P

    C

    ns ruc on

    Address Memory

    M

    U

    Data

    Address

    Inst.

    0-31 Rt

    RdM

    U

    Read

    Data 1

    Read

    Data 2Read

    Data

    Sign

    Extend

    Write

    Data

    16 32

    Write

    DataMaster side

    of register

    Slave side

    N. B. Dodge 01/1211 Lecture #21: The Pipeline MIPS Processor

    IF/ID ID/EX EX/MEM MEM/WB

    of register

    Note: Control lines and

    logic not shown for clarity After David A. Patterson and John L. Hennessy, Computer Organization and Design, 2nd Edition

  • 7/31/2019 Pipelined MIPS Porcessor

    12/48

    Erik Jonsson School of En ineerin andComputer Science

    e n vers y o exas a a as

    Instruction Process Through Pipeline (1)Instruction Process Through Pipeline (1)

    MU

    X

    ADD

    ADD+4

    Memory

    Left

    shift

    2

    Compare

    resultReg. Block

    ALU

    P

    C

    ns ruc on

    Address Memory

    M

    U

    Data

    Address

    Inst.

    0-31 Rt

    RdM

    U

    Read

    Data 1

    Read

    Data 2Read

    Data

    Stage 1: Instruction

    loaded into IF/ID

    Sign

    Extend

    Write

    Data

    16 32

    Write

    Data

    N. B. Dodge 01/1212 Lecture #21: The Pipeline MIPS Processor

    ,

    After David A. Patterson and John L. Hennessy, Computer Organization and Design, 2nd Edition

    IF/ID ID/EX EX/MEM MEM/WB

  • 7/31/2019 Pipelined MIPS Porcessor

    13/48

    Erik Jonsson School of En ineerin andComputer Science

    e n vers y o exas a a as

    Instruction Process Through Pipeline (2)Instruction Process Through Pipeline (2)age : ns ruc on

    decoded, register data

    accessed, immediates

    sign-extended

    MU

    X

    ADD

    ADD+4

    Memory

    Left

    shift

    2Compare

    resultReg. Block

    ALU

    P

    C

    Instruction

    Address Memory

    M

    U

    Data

    Address

    Inst.

    0-31

    s

    Rt

    RdM

    U

    Read

    Data 1

    Read

    Data 2Read

    Data

    Sign

    Extend

    XWriteData

    16

    X

    32

    Write

    Data

    N. B. Dodge 01/1213 Lecture #21: The Pipeline MIPS Processor

    After David A. Patterson and John L. Hennessy, Computer Organization and Design, 2nd Edition

    IF/ID ID/EX EX/MEM MEM/WB

  • 7/31/2019 Pipelined MIPS Porcessor

    14/48

    Erik Jonsson School of En ineerin andComputer Science

    e n vers y o exas a a as

    Instruction Process Through Pipeline (3)Instruction Process Through Pipeline (3)

    Stage 3: Instruction

    executed / branchaddress computed

    M

    U

    X

    ADD

    ADD+4

    Memory

    Left

    shift

    2

    Compare

    resultReg. Block

    ALU

    P

    C

    ns ruc on

    Address Memory

    M

    U

    Data

    Address

    Inst.

    0-31 Rt

    RdM

    U

    Read

    Data 1

    Read

    Data 2Read

    Data

    Sign

    Extend

    Write

    Data

    16 32

    Write

    Data

    N. B. Dodge 01/1214 Lecture #21: The Pipeline MIPS Processor

    After David A. Patterson and John L. Hennessy, Computer Organization and Design, 2nd Edition

    IF/ID ID/EX EX/MEM MEM/WB

  • 7/31/2019 Pipelined MIPS Porcessor

    15/48

    Erik Jonsson School of En ineerin andComputer Science

    e n vers y o exas a a as

    Instruction Process Through Pipeline (4)Instruction Process Through Pipeline (4)

    store, branch taken/not taken

    ALU results bypass taken to

    MEM/WB register

    M

    U

    X

    ADD

    ADD+4

    Memory

    Left

    shift

    2

    Compare

    resultReg. Block

    ALU

    P

    C

    ns ruc on

    Address Memory

    M

    U

    Data

    Address

    Inst.

    0-31 Rt

    RdM

    U

    Read

    Data 1

    Read

    Data 2Read

    Data

    Sign

    Extend

    Write

    Data

    16 32

    Write

    Data

    N. B. Dodge 01/1215 Lecture #21: The Pipeline MIPS Processor

    After David A. Patterson and John L. Hennessy, Computer Organization and Design, 2nd Edition

    IF/ID ID/EX EX/MEM MEM/WB

  • 7/31/2019 Pipelined MIPS Porcessor

    16/48

    Erik Jonsson School of En ineerin andComputer Science

    e n vers y o exas a a as

    Instruction Process Through Pipeline (5)Instruction Process Through Pipeline (5)

    M

    UX

    ADD+4

    Instruction

    Memory

    Left

    shift

    2Compare

    resultRs

    Reg. Block

    ALU

    CAddress emory

    M

    U

    X

    Data

    Address

    Inst.

    0-31 Rt

    Rd

    Write

    M

    U

    X

    Read

    Data 1

    Read

    Data 2Read

    Data

    Stage 5: Result

    write-back to

    Sign

    Extend

    Data

    16 32

    WriteData

    N. B. Dodge 01/1216 Lecture #21: The Pipeline MIPS Processor

    dest. register

    After David A. Patterson and John L. Hennessy, Computer Organization and Design, 2nd Edition

    IF/ID ID/EX EX/MEM MEM/WB

  • 7/31/2019 Pipelined MIPS Porcessor

    17/48

    Erik Jonsson School of En ineerin andComputer Science

    e n vers y o exas a a as

    Adding ControlAdding Control

    Control information must be carried along as a part ofthe instruction, since this information is required at

    different stages of the pipeline.

    This can be done by adding more inter-stage storage

    The result is very large inter-stage registers. For

    example, the storage capacity required between the

    instruction decode and ALU execution stages (ID/EXregister) is more than 120 bits.

    N. B. Dodge 01/1217 Lecture #21: The Pipeline MIPS Processor

    is shown on the next slide

  • 7/31/2019 Pipelined MIPS Porcessor

    18/48

    Erik Jonsson School of En ineerin andComputer Science

    e n vers y o exas a a as

    ID/EX EX/MEM MEM/WB

    ite

    ADD+4

    U

    X

    IF/ID

    -31

    Control

    Decode

    ite d

    Regis

    terWr

    P InstructionAddress

    Memory

    Left

    shift

    2

    Inst.

    Reg. Block

    Rs

    Rt ReadData 1

    Ins

    tructionbits

    ALU Srce

    Branch

    MemoryWr

    MemoryRe

    emor

    y/ALUResult

    ALUM

    U

    X

    Data

    Address

    -

    M

    U

    X

    Rd

    Write

    Data

    Read

    Data 2

    Write

    Read

    Data

    M

    Full PipelineDesign with

    MemorySign

    Extend

    32DataBits 0-15

    Bits 16-20

    Bits 11-15

    ALU

    Cont.

    M

    UALU Op

    N. B. Dodge 01/1218 Lecture #21: The Pipeline MIPS Processor

    After David A. Patterson and John L. Hennessy,Computer Organization and Design, 2nd Edition

    Reg. Dst.

  • 7/31/2019 Pipelined MIPS Porcessor

    19/48

    Erik Jonsson School of En ineerin andComputer Science

    e n vers y o exas a a as

    The Pipeline in ActionThe Pipeline in Action

    The following instruction sequence from the P&H textillustrates the pipeline in action.

    w ,

    sub $11, $2, $3

    , ,

    or $13, $6, $7

    add $14 $8 $9

    Note that registers are identified by number rather

    than the letter ids, since that is the way they appear in

    N. B. Dodge 01/1219 Lecture #21: The Pipeline MIPS Processor

    t e processor. s a rem n er, = at, - = t -

    t6, $2-3=$v0-v1, $4-7=$a0-a3, etc.

  • 7/31/2019 Pipelined MIPS Porcessor

    20/48

    IF: Idle ID/RF: Idle EX: Idle MEM: Idle WB:Idle

    M

    ID/EX EX/MEM MEM/WB

    rite

    ADD

    ADD+4

    U

    X

    IF/ID

    0-31

    Decode

    rite

    ead

    Reg

    ister

    P

    Instruction

    Address

    Memory

    Left

    shift

    2

    Inst.0-31

    Reg. Block

    Rs

    Rt Read

    Data 1Instructionbit

    ALU Srce

    Branch

    MemoryW

    MemoryR

    emory/ALUResul

    ALUM

    U

    X

    Data

    AddressM

    U

    X

    Rd

    Write

    Data

    Read

    Data 2

    Write

    Read

    Data

    Memory

    Sign

    Extend

    a as -

    Bits 16-20

    Bits 11-15

    ALU

    Cont.

    M

    U

    X

    ALU Op

    After David A. Patterson and John L. Hennessy,Computer Organization and Design, 2nd Edition20

    Reg. Dst.

    Lecture #21: The Pipeline MIPS Processor

  • 7/31/2019 Pipelined MIPS Porcessor

    21/48

    IF: lw $10 20 $1 ID/RF: Idle EX: Idle MEM: Idle WB:Idle

    ID/EX EX/MEM MEM/WB

    ite

    ADD+4

    U

    X

    IF/ID

    -31

    Control

    Decode

    ite d

    RegisterWr

    PInstruction

    Address

    Memory

    Left

    shift

    2

    Inst.

    Reg. Block

    Rs

    Rt ReadData 1In

    structionbits

    ALU Srce

    Branch

    MemoryWr

    MemoryRe

    emor

    y/ALUResult

    ALUM

    U

    X

    Data

    Address

    -

    M

    U

    X

    Rd

    Write

    Data

    Read

    Data 2

    Write

    Read

    Data

    M

    MemorySign

    Extend

    32DataBits 0-15

    Bits 16-20

    Bits 11-15

    ALU

    Cont.

    M

    UALU Op

    After David A. Patterson and John L. Hennessy,Computer Organization and Design, 2nd Edition21

    Reg. Dst.

    Lecture #21: The Pipeline MIPS Processor

  • 7/31/2019 Pipelined MIPS Porcessor

    22/48

    IF: sub $11 $2 $3 ID/RF: lw $10 20 $1 EX: Idle MEM: Idle WB:Idle

    ID/EX EX/MEM MEM/WB

    ite

    ADD+4

    U

    X

    IF/ID

    -31

    Control

    Decode

    ite d

    RegisterWr

    [ $1 ]

    $ 1

    $ 10

    PInstruction

    Address

    Memory

    Left

    shift

    2

    Inst.

    Reg. Block

    Rs

    Rt ReadData 1In

    structionbits

    ALU Srce

    Branch

    MemoryWr

    MemoryRe

    emor

    y/ALUResult

    X ALUM

    U

    X

    Data

    Address

    -

    M

    U

    X

    Rd

    Write

    Data

    Read

    Data 2

    Write

    Read

    Data

    M

    $ 10

    X

    20

    MemorySign

    ExtendDataBits 0-15

    Bits 16-20

    Bits 11-15

    ALU

    Cont.

    M

    UALU Op

    After David A. Patterson and John L. Hennessy,Computer Organization and Design, 2nd Edition22

    Reg. Dst.

    Lecture #21: The Pipeline MIPS Processor

  • 7/31/2019 Pipelined MIPS Porcessor

    23/48

    IF: and $12 $4 $5 MEM: Idle WB:ID/RF: sub $11 $2 $3 EX: lw $10 20 $1Idle

    ID/EX EX/MEM MEM/WB

    ite

    ADD+4

    U

    X

    IF/ID

    -31

    Control

    Decode

    ite d

    RegisterWr

    [ $ 2 ] [ $1 ]

    $ 3

    $ 2

    PInstruction

    Address

    Memory

    Left

    shift

    2

    Inst.

    Reg. Block

    Rs

    Rt ReadData 1In

    structionbits

    ALU Srce

    Branch

    MemoryWr

    MemoryRe

    emor

    y/ALUResult

    [ $ 3 ]

    add20

    ALUM

    U

    X

    Data

    Address

    -

    M

    U

    X

    Rd

    Write

    Data

    Read

    Data 2

    Write

    Read

    Data

    M

    $ 11

    X

    X

    20

    $ 10

    $ 10

    MemorySign

    Extend

    DataBits 0-15

    Bits 16-20

    Bits 11-15

    ALU

    Cont.

    M

    UALU Op

    After David A. Patterson and John L. Hennessy,Computer Organization and Design, 2nd Edition23

    Reg. Dst.

    Lecture #21: The Pipeline MIPS Processor

  • 7/31/2019 Pipelined MIPS Porcessor

    24/48

    IF: or $13 $6 $7 WB:ID/RF: and $12 $4 $5 EX: sub $11 $2 $3 MEM: lw $10 20 $1Idle

    ID/EX EX/MEM MEM/WB

    ite

    ADD+4

    U

    X

    IF/ID

    -31

    Control

    Decode

    ite d

    RegisterWr

    [ $2 ]

    $ 4

    $ 5

    PInstruction

    Address

    Memory

    Left

    shift

    2

    Inst.

    Reg. Block

    Rs

    Rt ReadData 1In

    structionbits

    ALU Srce

    Branch

    MemoryWr

    MemoryRe

    emory/ALUResult

    [$3]

    [ $3 ]sub

    ALUM

    U

    X

    Data

    Address

    -

    M

    U

    X

    Rd

    Write

    Data

    Read

    Data 2

    Write

    Read

    Data

    M

    X

    X

    $ 12 $ 11$ 11 $ 10

    MemorySign

    Extend

    DataBits 0-15

    Bits 16-20

    Bits 11-15

    ALU

    Cont.

    M

    UALU Op

    After David A. Patterson and John L. Hennessy,Computer Organization and Design, 2nd Edition24

    Reg. Dst.

    Lecture #21: The Pipeline MIPS Processor

  • 7/31/2019 Pipelined MIPS Porcessor

    25/48

    IF: add $14 $8 $9 ID/RF: or $13 $6 $7 EX: and $12 $4 $5 MEM: sub $11 $2 $3 WB: lw $1020($1)

    ID/EX EX/MEM MEM/WB

    ite

    ADD+4

    U

    X

    IF/ID

    -31

    Control

    Decode

    ite d

    RegisterWr

    [ $6 ]

    $ 6

    $ 7

    [ $4 ]PInstruction

    Address

    Memory

    Left

    shift

    2

    Inst.

    Reg. Block

    Rs

    Rt ReadData 1In

    structionbits

    ALU Srce

    Branch

    MemoryWr

    MemoryRe

    emory/ALUResult

    [ $7 ] [$5]

    [ $5 ]and

    $10 ALUM

    U

    X

    Data

    Address

    -

    M

    U

    X

    Rd

    Write

    Data

    Read

    Data 2

    Write

    Read

    Data

    M

    X

    X

    $ 13 $ 12$ 12 $ 11 $ 10

    MemorySignExtend

    32 DataBits 0-15

    Bits 16-20

    Bits 11-15

    ALU

    Cont.

    M

    UALU Op

    After David A. Patterson and John L. Hennessy,Computer Organization and Design, 2nd Edition25

    Reg. Dst.

    Lecture #21: The Pipeline MIPS Processor

  • 7/31/2019 Pipelined MIPS Porcessor

    26/48

    IF: Idle ID/RF: add $14 $8 $9 EX: or $13 $6 $7 MEM: and $12 WB: sub$4, $5 $11, $2, $3

    ID/EX EX/MEM MEM/WB

    ite

    ADD+4

    U

    X

    IF/ID

    -31

    Control

    Decode

    ite d

    RegisterWr

    [ $8 ]

    $ 8

    $ 9

    [ $6 ]PInstruction

    Address

    Memory

    Left

    shift

    2

    Inst.

    Reg. Block

    Rs

    Rt ReadData 1In

    structionbits

    ALU Srce

    Branch

    MemoryWr

    MemoryRe

    emory/ALUResult

    [ $9 ]$11 [$7]

    [ $7 ] or

    ALUM

    U

    X

    Data

    Address

    -

    M

    U

    X

    Rd

    Write

    Data

    Read

    Data 2

    Write

    Read

    Data

    M

    X

    X

    $ 14 $ 13$ 13 $ 12 $ 11

    MemorySignExtend

    DataBits 0-15

    Bits 16-20

    Bits 11-15

    ALU

    Cont.

    M

    UALU Op

    After David A. Patterson and John L. Hennessy,Computer Organization and Design, 2nd Edition26

    Reg. Dst.

    Lecture #21: The Pipeline MIPS Processor

  • 7/31/2019 Pipelined MIPS Porcessor

    27/48

    IF: Idle ID/RF: Idle WB EX: add $14 $8 $9$12, $4, $5

    ID/EX EX/MEM MEM/WB

    ite

    ADD+4

    U

    X

    IF/ID

    -31

    Control

    Decode

    ite d

    RegisterWr

    [ $8 ]PInstruction

    Address

    Memory

    Left

    shift

    2

    Inst.

    Reg. Block

    Rs

    Rt ReadData 1In

    structionbits

    ALU Srce

    Branch

    MemoryWr

    MemoryRe

    emory/ALUResult

    [$9]

    [ $9 ]add

    $12 ALUM

    U

    X

    Data

    Address

    -

    M

    U

    X

    Rd

    Write

    Data

    Read

    Data 2

    Write

    Read

    Data

    M

    $ 14$ 14 $ 13 $ 12

    MemorySignExtend

    32 DataBits 0-15

    Bits 16-20

    Bits 11-15

    ALU

    Cont.

    M

    UALU Op

    After David A. Patterson and John L. Hennessy,Computer Organization and Design, 2nd Edition27

    Reg. Dst.

    Lecture #21: The Pipeline MIPS Processor

  • 7/31/2019 Pipelined MIPS Porcessor

    28/48

    IF: Idle ID/RF: Idle WB EX: Idle$13, $6, $7

    ID/EX EX/MEM MEM/WB

    ite

    ADD+4

    U

    X

    IF/ID

    -31

    Control

    Decode

    ite d

    RegisterWr

    PInstruction

    Address

    Memory

    Left

    shift

    2

    Inst.

    Reg. Block

    Rs

    Rt ReadData 1In

    structionbits

    ALU Srce

    Branch

    MemoryWr

    MemoryRe

    emory/ALUResult

    $13ALU

    M

    U

    X

    Data

    Address

    -

    M

    U

    X

    Rd

    Write

    Data

    Read

    Data 2

    Write

    Read

    Data

    M

    $ 14 $ 13

    MemorySignExtend

    32 DataBits 0-15

    Bits 16-20

    Bits 11-15

    ALU

    Cont.

    M

    UALU Op

    After David A. Patterson and John L. Hennessy,Computer Organization and Design, 2nd Edition28

    Reg. Dst.

    Lecture #21: The Pipeline MIPS Processor

  • 7/31/2019 Pipelined MIPS Porcessor

    29/48

    IF: Idle ID/RF: Idle WB EX: Idle MEM: Idle$14, $8, $9

    ID/EX EX/MEM MEM/WB

    ite

    ADD+4

    U

    X

    IF/ID

    -31

    Control

    Decode

    ite d

    RegisterWr

    PInstruction

    Address

    Memory

    Left

    shift

    2

    Inst.

    Reg. Block

    Rs

    Rt ReadData 1In

    structionbits

    ALU Srce

    Branch

    MemoryWr

    MemoryRe

    emory/ALUResult

    $14ALU

    M

    U

    X

    Data

    Address

    -

    M

    U

    X

    Rd

    Write

    Data

    Read

    Data 2

    Write

    Read

    Data

    M

    $ 14

    MemorySignExtend

    32 DataBits 0-15

    Bits 16-20

    Bits 11-15

    ALU

    Cont.

    M

    UALU Op

    After David A. Patterson and John L. Hennessy,Computer Organization and Design, 2nd Edition29

    Reg. Dst.

    Lecture #21: The Pipeline MIPS Processor

  • 7/31/2019 Pipelined MIPS Porcessor

    30/48

    IF: Idle ID/RF: Idle EX: Idle MEM: Idle WB:Idle

    ID/EX EX/MEM MEM/WB

    ite

    ADD+4

    U

    X

    IF/ID

    -31

    Control

    Decode

    ite d

    RegisterWr

    PInstruction

    Address

    Memory

    Left

    shift

    2

    Inst.

    Reg. Block

    Rs

    Rt ReadData 1In

    structionbits

    ALU Srce

    Branch

    MemoryWr

    MemoryRe

    emory/ALUResult

    ALUM

    U

    X

    Data

    Address

    -

    M

    U

    X

    Rd

    Write

    Data

    Read

    Data 2

    Write

    Read

    Data

    M

    MemorySignExtend

    32 DataBits 0-15

    Bits 16-20

    Bits 11-15

    ALU

    Cont.

    M

    UALU Op

    30

    Reg. Dst.

    Lecture #21: The Pipeline MIPS Processor

  • 7/31/2019 Pipelined MIPS Porcessor

    31/48

    Erik Jonsson School of En ineerin and

    Computer Sciencee n vers y o exas a a as

    Pipeline Processor Operation SummaryPipeline Processor Operation Summary

    Pipelining replaces the single-cycle processor with a

    - ,

    completing one part of each instruction.

    A new instruction is started every clock cycle. Inter-process registers store instruction information

    (data, write register, branch conditions) between cycles

    between the pipeline stages.

    When the pipeline is filled with instructions, an

    N. B. Dodge 01/1231 Lecture #21: The Pipeline MIPS Processor

    instruction completes every clock cycle.

  • 7/31/2019 Pipelined MIPS Porcessor

    32/48

    Erik Jonsson School of En ineerin and

    Computer Sciencee n vers y o exas a a as

    Exercise 1Exercise 1

    On the diagram on the next page, identify the

    following:

    1. Highlight all the control lines that must be active during a load

    word instruction.2. As in our exercise in Lecture 20, identify the decoder

    locations.

    3. The ID/EX Re ister interface stores the most bits of an of the

    pipeline section interfaces. Approximately how many bits isthat, according to the diagram?

    N. B. Dodge 01/1232 Lecture #21: The Pipeline MIPS Processor

  • 7/31/2019 Pipelined MIPS Porcessor

    33/48

    Print out a copy of this diagram and bring to class.

    M

    U

    X

    ID/EX EX/MEM MEM/WB

    Control

    Decode

    gisterWrite

    ADD

    ADD+4

    Memory

    Left

    shift

    2

    Reg. Block

    ionbits0-3

    1

    Branch moryWrite

    moryRead

    Re

    Result

    ALU

    P

    C

    Instruction

    Address

    MData

    Inst.

    0-31

    M

    Rt

    Rd

    Read

    Data 1

    ReadData 2 Read

    Instruct

    ALU Srce Me

    Me

    Memory/AL

    Memory

    U

    X

    U

    X

    Write

    Data

    Sign

    Extend

    32Write

    Data

    a a

    Bits 0-15

    ALU

    Bits 16-20

    Bits 11-15

    .

    MU

    X

    ALU Op

    Reg. Dst.

    Lecture #21: The Pipeline MIPS Processor

  • 7/31/2019 Pipelined MIPS Porcessor

    34/48

    Erik Jonsson School of En ineerin and

    Computer Sciencee n vers y o exas a a as

    HazardsHazards

    Hazards occur because data required for executing the

    .

    An instruction in the register fetch cycle may need

    data from a register whose value will be changed by aninstruction downstream but still in process in the

    pipeline (in the ALU, memory/memory bypass or

    writeback c cle .

    Thus an upstream instruction could access a register

    and get incorrect data because the register data has not

    N. B. Dodge 01/1234 Lecture #21: The Pipeline MIPS Processor

    yet een up ate y a ownstream nstruct on.

  • 7/31/2019 Pipelined MIPS Porcessor

    35/48

    Erik Jonsson School of En ineerin and

    Computer Sciencee n vers y o exas a a as

    Hazards (2)Hazards (2)

    There are two types of hazards, data hazards, andcontrol hazards.

    Both occur because an instruction in the ID/RF stage of

    the MIPS pipeline needs register data that will be

    MEM/Bypass, or WB stage.

    Data hazards occur when an instruction needs register

    contents for an arithmetic/ logical/memory instruction. Control hazards occur when a branch instruction is

    N. B. Dodge 01/1235 Lecture #21: The Pipeline MIPS Processor

    branch is not yet available in the same sort of scenario.

  • 7/31/2019 Pipelined MIPS Porcessor

    36/48

    Erik Jonsson School of En ineerin and

    Computer Sciencee n vers y o exas a a as

    Data Hazard in the PipelineData Hazard in the Pipelineme ne

    (clock

    cycles) 0 1 2 3 4 5 6 7 8 9 105 clock cycles

    sub $2, $1, $3

    and $12, $2, $5

    or $13, $6, $2 Instruc. Reg. ALU Mem. R/W Reg.

    Instruc.

    Fetch

    Reg.

    Fetch

    ALU

    Process

    Mem. R/W

    or ALU Out

    Reg.

    Write

    Instruc.

    Fetch

    Reg.

    Fetch

    ALU

    Process

    Mem. R/W

    or ALU Out

    Reg.

    Write

    add $14, $2, $2

    sw $15, 100($2) Instruc.Fetch

    Reg.

    Fetch

    ALU

    Process

    Mem. R/W

    or ALU Out

    Reg.

    Write

    Instruc.

    Fetch

    Reg.

    Fetch

    ALU

    Process

    Mem. R/W

    or ALU Out

    Reg.

    Write

    In the instruction sequence above, the last four instructionsrequire data from $2, which is changed in the first instruction.

    The $2 data will not be rewritten until cycle 4, so the AND and OR

    N. B. Dodge 01/1236 Lecture #21: The Pipeline MIPS Processor

    n an r ns ruc ons w e c ncorrec a a rom .

    Even the add may not get the correct information (sw is okay).

  • 7/31/2019 Pipelined MIPS Porcessor

    37/48

    Erik Jonsson School of En ineerin and

    Computer Sciencee n vers y o exas a a as

    Control Hazards in the PipelineControl Hazards in the Pipelineme ne

    (clock

    cycles) 0 1 2 3 4 5 6 7 8 9 105 clock cycles

    sub $2, $1, $3

    blt $2, $8, wait

    b t $2, $7, o Instruc. Reg. ALU Mem. R/W Reg.

    Instruc.

    Fetch

    Reg.

    Fetch

    ALU

    Process

    Mem. R/W

    or ALU Out

    Reg.

    Write

    Instruc.

    Fetch

    Reg.

    Fetch

    ALU

    Process

    Mem. R/W

    or ALU Out

    Reg.

    Write

    add $14, $2, $2

    sw $15, 100($2) Instruc.Fetch

    Reg.

    Fetch

    ALU

    Process

    Mem. R/W

    or ALU Out

    Reg.

    Write

    Instruc.

    Fetch

    Reg.

    Fetch

    ALU

    Process

    Mem. R/W

    or ALU Out

    Reg.

    Write

    Here the problem is changed, with two branch instructions added.

    Neither branch instruction may be executed correctly, once again

    N. B. Dodge 01/1237 Lecture #21: The Pipeline MIPS Processor

    .

    This wrong data could cause an incorrect branch.

  • 7/31/2019 Pipelined MIPS Porcessor

    38/48

    Erik Jonsson School of En ineerin and

    Computer Sciencee n vers y o exas a a as

    Forwarding as a Solution to Data HazardsForwarding as a Solution to Data Hazards

    0 1 2 3 4 5

    oc cyc es

    ID/

    RF

    ID/

    One solution to the problem of data hazards is forwarding.

    RF

    Forwarding uses the fact that although instruction 2 needs registerdata two clock cycles before instruction 1 enters the WB stage, that

    data is already available as the output of the ALU.

    N. B. Dodge 01/1238 Lecture #21: The Pipeline MIPS Processor

    If a mechanism were available, instruction 1 could forward required

    register data after its ALU cycle to the ID/RF cycle of instruction 2.

  • 7/31/2019 Pipelined MIPS Porcessor

    39/48

    Erik Jonsson School of En ineerin and

    Computer Sciencee n vers y o exas a a as

    Forwarding Unit in the PipelineForwarding Unit in the Pipeline

    Rs

    Rt

    Read

    Data 1

    ID/EX EX/MEM MEM/WB

    M

    U

    ALU MU

    X

    Rd

    Write

    Data

    Read

    Data 2

    Data

    AddressRead

    Data

    M

    U

    X

    Forward A

    Memory

    r e

    Data

    M

    X

    Rs

    RtEX/MEM Register Rd

    Forward B

    X

    Forwarding

    UnitMEM/WB Register Rd

    N. B. Dodge 01/1239 Lecture #21: The Pipeline MIPS Processor

    After David A. Patterson and John L. Hennessy, Computer Organization and Design, 2nd Edition

  • 7/31/2019 Pipelined MIPS Porcessor

    40/48

    Erik Jonsson School of En ineerin and

    Computer Sciencee n vers y o exas a a as

    Forwarding Unit OperationForwarding Unit Operation

    ALU

    Reg. Block

    Memory

    ForwardingUnit

    The forwarding unit samples register ids in the EX/MEM and

    MEM/WB registers to determine if source registers in the ID/RF

    cyc e are e same.

    If so, source register data is replaced by pipeline (as yet unwritten)

    data by the forwarding unit.

    N. B. Dodge 01/1240 Lecture #21: The Pipeline MIPS Processor

    The correct information is thus processed and the instruction can

    proceed to correct execution.

  • 7/31/2019 Pipelined MIPS Porcessor

    41/48

    Erik Jonsson School of En ineerin and

    Computer Sciencee n vers y o exas a a as

    StallsStalls

    Forwarding will not always solve the problems of data hazards.

    For exam le su ose an add instruction follows a load word lw

    and the add involves the register that receives the memory data.

    In this case, forwarding will not work.

    ,

    will not be available until the end of the MEM cycle. Thus the

    required data is not available for a forward, and the add

    instruction. if it roceeds will rocess the wron data.

    A solution to this problem is the stall.

    A stall halts the instruction awaiting data, while the key

    N. B. Dodge 01/1241 Lecture #21: The Pipeline MIPS Processor

    cycle, after which the desired data is available to the add.

  • 7/31/2019 Pipelined MIPS Porcessor

    42/48

    Erik Jonsson School of En ineerin and

    Computer Sciencee n vers y o exas a a as

    Result of Stall ApproachResult of Stall Approachme ne

    (clock

    cycles) 0 1 2 3 4 5 6 7 8 9 105 clock cycles

    lw $2, 32($3)

    add $14, $6, $2

    sw $15, 80 $2 Instruc. Reg. ALU Mem. R/W Reg.

    Instruc.

    Fetch

    Reg.

    Fetch

    ALU

    Process

    Mem. R/W

    or ALU Out

    Reg.

    Write

    Instruc.

    Fetch

    Reg.

    Fetch

    ALU

    Process

    Mem. R/W

    or ALU Out

    Reg.

    Write

    Consider the 3 instructions above, the last two

    depending on the lw.

    $2 contents will be available at the beginning of the WBstage in the first instruction, but not before.

    N. B. Dodge 01/1242 Lecture #21: The Pipeline MIPS Processor

    ,

    the add and sw instructions hold place for one cycle.

  • 7/31/2019 Pipelined MIPS Porcessor

    43/48

    Erik Jonsson School of En ineerin and

    Computer Sciencee n vers y o exas a a as

    Result of Stall Approach (2)Result of Stall Approach (2)(clock

    cycles) 0 1 2 3 4 5 6 7 8 9 105 clock cycles

    lw $2, 32($3)

    add $14, $6, $2 (delayed 1 count)

    sw $15, 80($2) (delayed 1 count)Instruc.

    Fetch

    Reg.

    Fetch

    ALU

    Process

    Mem. R/W

    or ALU Out

    Reg.

    Write

    Instruc.

    Fetch

    Reg.

    Fetch

    ALU

    Process

    Mem. R/W

    or ALU Out

    Reg.

    Write

    .

    Fetch

    .

    Fetch Process

    .

    or ALU Out

    .

    Write

    With the delay, the lw result feeds the ALU input stage

    of the add instruction, and the fetch stage of the sw.

    Note that forwarding in still required (this time fromthe MEM/WB interface, not the ALU output).

    N. B. Dodge 01/1243 Lecture #21: The Pipeline MIPS Processor

    , ,

    following a lw must also be delayed for one clock cycle.

  • 7/31/2019 Pipelined MIPS Porcessor

    44/48

    Erik Jonsson School of En ineerin and

    Computer Sciencee n vers y o exas a a as

    Other Problems With BranchesOther Problems With Branches

    A remaining problem is what to do about instructions following a

    branch. Even assuming forwarding and stalls, the branch/no

    branch decision is not made until the third stage. This means that

    in the MIPS pipeline, two following instructions will enter the pipe

    before the branch/no branch decision is made. What if: The following instructions were for the case of branch taken and

    the branch was not taken.

    The following instructions were for branch not taken and it was

    a en.

    In either case, the wrong instructions are in the pipe and they must

    be eliminated (flushed). How can this problem be prevented?

    N. B. Dodge 01/1244 Lecture #21: The Pipeline MIPS Processor

    ew approac es to t e pro em are s own n t e o ow ng s es.

  • 7/31/2019 Pipelined MIPS Porcessor

    45/48

    Erik Jonsson School of En ineerin and

    Computer Sciencee n vers y o exas a a as

    Control Hazard Approaches (1)Control Hazard Approaches (1)

    MIPS R-2000 Pipeline Processor

    WBALU/EX

    ID/RFIFMEM/

    One a roach is to alwa s assume the branch is or is not taken:

    Direction of pipeline flow

    Say we assume the branch is never taken. Then if the instruction in ALU/EX

    is a branch, the instructions in IF and ID/RF will be those in the not taken

    program line (branch determination is made in ALU/EX).

    s assump on s correc , e p pe ne w con nue o ow w ou e ay.

    When the branch is taken, instructions in IF and ID/RF must be flushed,

    usually by changing the op code of those instructions to a nop and letting

    them proceed to the end of the pipe.

    N. B. Dodge 01/1245 Lecture #21: The Pipeline MIPS Processor

    Clearly, a 2-clock time delay is involved here, and it would be worse for longer

    pipelines (P-IV pipeline ~ 20 stages).

  • 7/31/2019 Pipelined MIPS Porcessor

    46/48

    Erik Jonsson School of En ineerin and

    Computer Sciencee n vers y o exas a a as

    Control Hazard Approaches (2)Control Hazard Approaches (2)

    MIPS R-2000 Pipeline Processor

    WBALU/EXID/RF

    IFMEM/

    Branch

    Comparator

    Reducing the cost of taking the branch:

    In this case, a branch assumption is still made (taken or not taken).

    identified in the ID/RF stage, a comparator can be added there to do thebranch/no-branch determination.

    With the branch determination made in this earl sta e onl one

    N. B. Dodge 01/1246 Lecture #21: The Pipeline MIPS Processor

    instruction must be flushed, in the IF stage (only a 1-instruction delay).

  • 7/31/2019 Pipelined MIPS Porcessor

    47/48

    Erik Jonsson School of En ineerin and

    Computer Sciencee n vers y o exas a a as

    Control Hazard Approaches (3)Control Hazard Approaches (3)

    MIPS R-2000 Pipeline Processor

    WBALU/EXID/RF

    IFMEM/

    Branch feedback based on History

    Branch

    History

    ynam c ranc pre c on ase on recen ranc s ory:

    In this approach, an indicator bit (0/1) gives the last branch condition.

    The next branch can be made according to the bit setting.

    ,

    time until a substantial number of calculations are complete. Some schemes use 2 bits and do not change the prediction until the

    predictor is wrong twice, after which the alternate behavior is chosen.

    N. B. Dodge 01/1247 Lecture #21: The Pipeline MIPS Processor

    In either case, incorrect predictions will still be made, but hopefully notas often.

  • 7/31/2019 Pipelined MIPS Porcessor

    48/48

    Erik Jonsson School of En ineerin and

    Computer Sciencee n vers y o exas a a as

    Exercise 2Exercise 2

    1. Explain forwarding in your own words.

    .

    problem be solved?

    3. Wh could 2-bit d namic branch rediction work toensure about a 1% error rate in branch prediction in

    a subroutine that loops about 100 times before

    called frequently, and that it always executes 100 ormore loop traversals before returning to the calling

    N. B. Dodge 01/1248 Lecture #21: The Pipeline MIPS Processor

    program.