pipeline performance

37
Pipeline Performance Assume time for stages is 100ps for register read or write 200ps for other stages Compare pipelined datapath with single- cycle datapath Instr Instr fetch Registe r read ALU op Memory access Registe r write Total time lw 200ps 100 ps 200ps 200ps 100 ps 800ps sw 200ps 100 ps 200ps 200ps 700ps R- format 200ps 100 ps 200ps 100 ps 600ps beq 200ps 100 ps 200ps 500ps

Upload: kamal-warner

Post on 02-Jan-2016

24 views

Category:

Documents


1 download

DESCRIPTION

Pipeline Performance. Assume time for stages is 100ps for register read or write 200ps for other stages Compare pipelined datapath with single-cycle datapath. Pipeline Performance. Single-cycle (T c = 800ps). Pipelined (T c = 200ps). Pipeline Speedup. If all stages are balanced - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Pipeline Performance

Pipeline Performance

Assume time for stages is 100ps for register read or write 200ps for other stages

Compare pipelined datapath with single-cycle datapath

Instr Instr fetch

Register read

ALU op Memory access

Register write

Total time

lw 200ps 100 ps 200ps 200ps 100 ps 800ps

sw 200ps 100 ps 200ps 200ps 700ps

R-format 200ps 100 ps 200ps 100 ps 600ps

beq 200ps 100 ps 200ps 500ps

Page 2: Pipeline Performance

Pipeline PerformanceSingle-cycle (Tc= 800ps)

Pipelined (Tc= 200ps)

Page 3: Pipeline Performance

Pipeline Speedup

If all stages are balanced i.e., all take the same time Time between instructionspipelined

= Time between instructionsnonpipelined

Number of stages If not balanced, speedup is less Speedup due to increased throughput

Latency (time for each instruction) does not decrease

Page 4: Pipeline Performance

Pipelining and ISA Design

• MIPS ISA designed for pipelining• All instructions are 32-bits

• Easier to fetch and decode in one cycle• c.f. x86: 1- to 17-byte instructions

• Few and regular instruction formats• Can decode and read registers in one step

• Load/store addressing• Can calculate address in 3rd stage, access

memory in 4th stage• Alignment of memory operands

• Memory access takes only one cycle

Page 5: Pipeline Performance

5

Improving performance

Two ideas for improving performance:

1. Spilt each instruction into multiple steps, each taking 1 cycle steps: IF (instruction fetch), ID (instruction decode), EX

(execute ALU operation), MEM (memory access), WB (register write-back)

slow instructions take more cycles than fast instructions known as a multi-cycle implementation

2. Crucial observation: each instruction uses only a portion of the datapath in each step

can overlap instructions; each uses one portion of the datapath

known as a pipelined implementation

Examples of pipelining: any assembly process (cars, sandwiches), multiple loads of laundry (washer + dryer can be pipelined), etc.

Page 6: Pipeline Performance

6

Pipelining not just Multiprocessing

Pipelining does involve parallel processing, but in a specific way

Both multiprocessing and pipelining relate to the processing of multiple “things” using multiple “functional units” — In multiprocessing, each thing is processed entirely by a single

functional unit• e.g. multiple lanes at the supermarket

— In pipelining, each thing is broken into a sequence of pieces, where each piece is handled by a different (specialized) functional unit

• e.g. checker vs. bagger

Pipelining and multiprocessing are not mutually exclusive— Modern processors do both, with multiple pipelines (e.g.

superscalar)

Pipelining is a general-purpose efficiency technique; used elsewhere in CS:— Networking, I/O devices, server software architecture

Page 7: Pipeline Performance

7

Instruction Fetch (IF)

Readaddress

Instructionmemory

Instruction[31-0]

Readaddress

Writeaddress

Writedata

Datamemory

Readdata

MemWrite

MemRead

1

Mux

0

MemToReg

Signextend

0

Mux

1

ALUSrc

Result

ZeroALU

ALUOp

I [15 - 0]

I [25 - 21]

I [20 - 16]

I [15 - 11]

0

Mux

1

RegDst

Readregister 1

Readregister 2

Writeregister

Writedata

Readdata 2

Readdata 1

Registers

RegWrite

While IF is executing, the rest of the data path is sitting idle…

Page 8: Pipeline Performance

8

Instruction Decode (ID)

Readaddress

Instructionmemory

Instruction[31-0]

Readaddress

Writeaddress

Writedata

Datamemory

Readdata

MemWrite

MemRead

1

Mux

0

MemToReg

Signextend

0

Mux

1

ALUSrc

Result

ZeroALU

ALUOp

I [15 - 0]

I [25 - 21]

I [20 - 16]

I [15 - 11]

0

Mux

1

RegDst

Readregister 1

Readregister 2

Writeregister

Writedata

Readdata 2

Readdata 1

Registers

RegWrite

Then while ID is executing, the IF-related portion becomes idle…

Page 9: Pipeline Performance

9

Execute (EX)

Readaddress

Instructionmemory

Instruction[31-0]

Readaddress

Writeaddress

Writedata

Datamemory

Readdata

MemWrite

MemRead

1

Mux

0

MemToReg

Signextend

0

Mux

1

ALUSrc

Result

ZeroALU

ALUOp

I [15 - 0]

I [25 - 21]

I [20 - 16]

I [15 - 11]

0

Mux

1

RegDst

Readregister 1

Readregister 2

Writeregister

Writedata

Readdata 2

Readdata 1

Registers

RegWrite

..and so on for the EX portion…

Page 10: Pipeline Performance

10

Memory (MEM)

Readaddress

Instructionmemory

Instruction[31-0]

Readaddress

Writeaddress

Writedata

Datamemory

Readdata

MemWrite

MemRead

1

Mux

0

MemToReg

Signextend

0

Mux

1

ALUSrc

Result

ZeroALU

ALUOp

I [15 - 0]

I [25 - 21]

I [20 - 16]

I [15 - 11]

0

Mux

1

RegDst

Readregister 1

Readregister 2

Writeregister

Writedata

Readdata 2

Readdata 1

Registers

RegWrite

…the MEM portion…

Page 11: Pipeline Performance

11

Write back (WB)

Readaddress

Instructionmemory

Instruction[31-0]

Readaddress

Writeaddress

Writedata

Datamemory

Readdata

MemWrite

MemRead

1

Mux

0

MemToReg

Signextend

0

Mux

1

ALUSrc

Result

ZeroALU

ALUOp

I [15 - 0]

I [25 - 21]

I [20 - 16]

I [15 - 11]

0

Mux

1

RegDst

Readregister 1

Readregister 2

Writeregister

Writedata

Readdata 2

Readdata 1

Registers

RegWrite

…and the WB portion

Page 12: Pipeline Performance

12

Decoding and fetching together

Why don’t we go ahead and fetch the next instruction while we’re decoding the first one?

Instructionmemory

Instruction[31-0]

Readaddress

Writeaddress

Writedata

Datamemory

Readdata

MemWrite

MemRead

1

Mux

0

MemToReg

Signextend

0

Mux

1

ALUSrc

Result

ZeroALU

ALUOp

I [15 - 0]

I [25 - 21]

I [20 - 16]

I [15 - 11]

0

Mux

1

RegDst

Readregister 1

Readregister 2

Writeregister

Writedata

Readdata 2

Readdata 1

Registers

RegWrite

Readaddress

Decode 1st instructionFetch 2nd

Page 13: Pipeline Performance

13

Executing, decoding and fetching Similarly, once the first instruction enters its Execute stage, we

can go ahead and decode the second instruction. But now the instruction memory is free again, so we can fetch the

third instruction!

Readaddress

Instructionmemory

Instruction[31-0]

Readaddress

Writeaddress

Writedata

Datamemory

Readdata

MemWrite

MemRead

1

Mux

0

MemToReg

Signextend

0

Mux

1

ALUSrc

Result

ZeroALU

ALUOp

I [15 - 0]

I [25 - 21]

I [20 - 16]

I [15 - 11]

0

Mux

1

RegDst

Readregister 1

Readregister 2

Writeregister

Writedata

Readdata 2

Readdata 1

Registers

RegWrite

Decode 2ndFetch 3rd

Execute 1st

Page 14: Pipeline Performance

14

Break datapath into 5 stages

Each stage has its own functional units Full pipeline the datapath is simultaneously working on 5

instructions!

Readaddress

Instructionmemory

Instruction[31-0]

Readaddress

Writeaddress

Writedata

Datamemory

Readdata

MemWrite

MemRead

1

Mux

0

MemToReg

Signextend

0

Mux

1

ALUSrc

Result

ZeroALU

ALUOp

I [15 - 0]

I [25 - 21]

I [20 - 16]

I [15 - 11]

0

Mux

1

RegDst

Readregister 1

Readregister 2

Writeregister

Writedata

Readdata 2

Readdata 1

Registers

RegWrite

IDIF EXE MEM WB

newest oldest

Page 15: Pipeline Performance

15

A pipeline diagram

A pipeline diagram shows the execution of a series of instructions— The instruction sequence is shown vertically, from top to

bottom— Clock cycles are shown horizontally, from left to right— Each instruction is divided into its component stages

This clearly indicates the overlapping of instructions. For example, there are three instructions active in the third cycle above.— The “lw” instruction is in its Execute stage.— Simultaneously, the “sub” is in its Instruction Decode stage.— Also, the “and” instruction is just being fetched.

Clock cycle1 2 3 4 5 6 7 8 9

lw $t0, 4($sp) IF ID EX MEM WBsub $v0, $a0, $a1

IF ID EX MEM

WB

and $t1, $t2, $t3 IF ID EX MEM

WB

or $s0, $s1, $s2

IF ID EX MEM WB

addi $sp, $sp, -4 IF ID EX MEM WB

Page 16: Pipeline Performance

16

Pipeline terminology

The pipeline depth is the number of stages—in this case, five In the first four cycles here, the pipeline is filling, since there are

unused functional units In cycle 5, the pipeline is full. Five instructions are being executed

simultaneously, so all hardware units are in use In cycles 6-9, the pipeline is emptying

filling full emptying

Clock cycle1 2 3 4 5 6 7 8 9

lw $t0, 4($sp) IF ID EX MEM WBsub $v0, $a0, $a1

IF ID EX MEM

WB

and $t1, $t2, $t3 IF ID EX MEM

WB

or $s0, $s1, $s2

IF ID EX MEM WB

add $sp, $sp, -4 IF ID EX MEM WB

Page 17: Pipeline Performance

17

Pipelining Performance

Execution time on ideal pipeline:— time to fill the pipeline + one cycle per instruction— How long for N instructions? k 1 + N, where k = pipeline

depth Alternate way of arriving at this formula: k cycles for the first

instruction, plus 1 for each of the remaining N 1 instructions.

Compare this pipelined implementation (2ns clock period) vs. a single cycle implementation (8ns clock period). How much faster is pipelining for N=1000 ?

Clock cycle1 2 3 4 5 6 7 8 9

lw $t0, 4($sp) IF ID EX MEM WBlw $t1, 8($sp) IF ID EX ME

MWB

lw $t2, 12($sp) IF ID EX MEM

WB

lw $t3, 16($sp) IF ID EX MEM WBlw $t4, 20($sp) IF ID EX MEM WB

filling

Page 18: Pipeline Performance

18

Pipeline Datapath: Resource Requirements

Clock cycle1 2 3 4 5 6 7 8 9

lw $t0, 4($sp) IF ID EX MEM WBlw $t1, 8($sp) IF ID EX ME

MWB

lw $t2, 12($sp) IF ID EX MEM

WB

lw $t3, 16($sp) IF ID EX MEM WBlw $t4, 20($sp) IF ID EX MEM WB We need to perform several operations in the same cycle.

— Increment the PC and add registers at the same time. — Fetch one instruction while another one reads or writes data.

What does that mean for our hardware?— Separate ADDER and ALU— Two memories (instruction memory and data memory)

Page 19: Pipeline Performance

19

Single-cycle datapath, slightly rearranged

MemToReg

Readaddress

Instructionmemory

Instruction[31-0]

Address

Writedata

Data memory

Readdata

MemWrite

MemRead

1

0

4

Shiftleft 2

PC

Add

1

0

PCSrc

Signextend

ALUSrc

Result

ZeroALU

ALUOp

Instr [15 - 0]RegDst

Readregister 1

Readregister 2

Writeregister

Writedata

Readdata 2

Readdata 1

Registers

RegWrite

Add

Instr [15 - 11]

Instr [20 - 16]0

1

0

1

Page 20: Pipeline Performance

20

Pipeline registers In pipelining, we divide instruction execution into multiple cycles

— IF ID EX MEM WB Information computed during one cycle may be needed in a later

cycle:— Instruction read in IF stage determines which registers are

fetched in ID stage, what immediate is used for EX stage, and what destination register is for WB

— Register values read in ID are used in EX and/or MEM stages— ALU output produced in EX is an effective address for MEM or a

result for WB

A lot of information to save!— Saved in intermediate registers called pipeline registers

The registers are named for the stages they connect:

IF/ID ID/EX EX/MEM MEM/WB

No register is needed after the WB stage, because after WB the instruction is done

Page 21: Pipeline Performance

21

Pipelined datapath

Readaddress

Instructionmemory

Instruction[31-0]

Address

Writedata

Data memory

Readdata

MemWrite

MemRead

1

0

MemToReg

4

Shiftleft 2

Add

Signextend

ALUSrc

Result

ZeroALU

ALUOp

Instr [15 - 0]RegDst

Readregister 1

Readregister 2

Writeregister

Writedata

Readdata 2

Readdata 1

Registers

RegWrite

Add

Instr [15 - 11]

Instr [20 - 16]0

1

0

1

IF/ID ID/EX EX/MEM MEM/WB

1

0

PCSrc

PC

Page 22: Pipeline Performance

22

Propagating values forward

Data values required later propagated through the pipeline registers

The most extreme example is the destination register (rd or rt)— It is retrieved in IF, but isn’t updated until the WB— Thus, it must be passed through all pipeline stages, as

shown in red on the next slide

Notice that we can’t keep a single “instruction register,” because the pipelined machine needs to fetch a new instruction every clock cycle

Page 23: Pipeline Performance

23

The destination register

Readaddress

Instructionmemory

Instruction[31-0]

Address

Writedata

Data memory

Readdata

MemWrite

MemRead

1

0

MemToReg

4

Shiftleft 2

Add

ALUSrc

Result

ZeroALU

ALUOp

Instr [15 - 0]RegDst

Readregister 1

Readregister 2

Writeregister

Writedata

Readdata 2

Readdata 1

Registers

RegWrite

Add

Instr [15 - 11]

Instr [20 - 16]0

1

0

1

IF/ID ID/EX EX/MEM MEM/WB

1

0

PCSrc

PC

Signextend

Page 24: Pipeline Performance

24

What about control signals?

Control signals generated similar to the single-cycle processor— in the ID stage, the processor decodes the instruction fetched in

IF and produces the appropriate control values

Some of the control signals will not be needed until later stages— These signals must be propagated through the pipeline until

they reach the appropriate stage— We just pass them in the pipeline registers, along with the data

Control signals can be categorized by the pipeline stage that uses them Stag

eControl signals needed

EX ALUSrc ALUOp RegDst

MEM MemRead MemWrite PCSrc

WB RegWrite MemToReg

Page 25: Pipeline Performance

25

Pipelined data path and control

Readaddress

Instructionmemory

Instruction[31-0]

Address

Writedata

Data memory

Readdata

MemWrite

MemRead

1

0

MemToReg

4

Shiftleft 2

Add

ALUSrc

Result

ZeroALU

ALUOp

Instr [15 - 0]RegDst

Readregister 1

Readregister 2

Writeregister

Writedata

Readdata 2

Readdata 1

Registers

RegWrite

Add

Instr [15 - 11]

Instr [20 - 16]0

1

0

1

IF/ID

ID/EX

EX/MEM

MEM/WB Control

M

WB

WB

PC

1

0

PCSrc

Signextend

EX

M

WB

Page 26: Pipeline Performance

26

Here’s a sample sequence of instructions to execute

1000: lw $8, 4($29)1004: sub $2, $4, $51008: and $9, $10, $111012: or $16, $17, $181016: add $13, $14, $0

We’ll make some assumptions, just so we can show actual data values:— Each register contains its number plus 100. For instance,

register $8 contains 108, register $29 contains 129, etc.— Every data memory location contains 99

Our pipeline diagrams will follow some conventions:— An X indicates values that aren’t important, like the constant

field of an R-type instruction— Question marks ??? indicate values we don’t know, usually

resulting from instructions coming before and after the ones in our example

An example execution sequence

addresses in decimal

Page 27: Pipeline Performance

27

Cycle 1 (filling)IF: lw $8, 4($29) MEM: ??? WB: ???EX: ???ID: ???

Readaddress

Instructionmemory

Instruction[31-0]

Address

Writedata

Data memory

Readdata

MemWrite (?)

MemRead (?)

1

0

MemToReg(?)

Shiftleft 2

Add

1

0

PCSrc

ALUSrc (?)

Result

ZeroALU

ALUOp (???)

RegDst (?)

Readregister 1

Readregister 2

Writeregister

Writedata

Readdata 2

Readdata 1

Registers

RegWrite (?)

Add

0

1

0

1

IF/ID

ID/EX

EX/MEM

MEM/WB Control

M

WB

WB

1000

1004

???

???

???

???

???

???

???

???

???

???

???

???

???

???

???

???

???

???

???

???

??????

???

4

PC

Signextend

EX

M

WB

Page 28: Pipeline Performance

28

Cycle 2ID: lw $8, 4($29)IF: sub $2, $4, $5 MEM: ??? WB: ???EX: ???

Readaddress

Instructionmemory

Instruction[31-0]

Address

Writedata

Data memory

Readdata

1

0

4

Shiftleft 2

Add

PCSrc

Result

ZeroALU

4

Readregister 1

Readregister 2

Writeregister

Writedata

Readdata 2

Readdata 1

Registers

Add

X

80

1

0

1

IF/ID

ID/EX

EX/MEM

MEM/WB Control

M

WB

WB

100429

X

1008

129

X

MemToReg(?)

???

???

???

???

???

???

RegWrite (?)

MemWrite (?)

MemRead (?)

???

???

???

ALUSrc (?)

ALUOp (???)

RegDst (?)

???

???

???

???

??????

???

PC

Signextend

EX

M

WB

1

0

Page 29: Pipeline Performance

29

Cycle 3ID: sub $2, $4, $5IF: and $9, $10, $11 EX: lw $8, 4($29) MEM: ??? WB: ???

MemToReg(?)

Readaddress

Instructionmemory

Instruction[31-0]

Address

Writedata

Data memory

Readdata

MemWrite (?)

MemRead (?)

1

0

4

Shiftleft 2

Add

PCSrc

ALUSrc (1)

Result

ZeroALU

ALUOp (add)

XRegDst (0)

Readregister 1

Readregister 2

Writeregister

Writedata

Readdata 2

Readdata 1

Registers

Add

2

X0

1

0

1

IF/ID

ID/EX

EX/MEM

MEM/WB Control

M

WB

WB

10084

5

1012

104

105

129

4

X

8

X8

133 4

???

???

???

???

???

???

RegWrite (?)

???

???

???

PC

Signextend

1

0

EX

M

WB

Page 30: Pipeline Performance

30

Cycle 4ID: and $9, $10, $11IF: or $16, $17, $18 EX: sub $2, $4, $5 MEM: lw $8, 4($29) WB: ???

Readaddress

Instructionmemory

Instruction[31-0]

Address

Writedata

Data memory

Readdata

MemWrite (0)

MemRead (1)

1

0

MemToReg(?)

4

Shiftleft 2

Add

PCSrc

ALUSrc (0)

Result

ZeroALU

ALUOp (sub)

XRegDst (1)

Readregister 1

Readregister 2

Writeregister

Writedata

Readdata 2

Readdata 1

Registers

RegWrite (?)

Add

9

X0

1

0

1

IF/ID

ID/EX

EX/MEM

MEM/WB Control

M

WB

WB

101210

11

1016

110

111

104

X

105

X

22

–1

133

X 99

8

???

???

???

???

???

???

PC

Signextend

EX

M

WB

1

0

Page 31: Pipeline Performance

31

Cycle 5 (full)ID: or $16, $17, $18IF: add $13, $14, $0 EX: and $9, $10, $11 MEM: sub $2, $4, $5 WB:

lw $8, 4($29)

Readaddress

Instructionmemory

Instruction[31-0]

Address

Writedata

Data memory

Readdata

MemWrite (0)

MemRead (0)

1

0

MemToReg(1)

4

Shiftleft 2

Add

PCSrc

ALUSrc (0)

Result

ZeroALU

ALUOp (and)

XRegDst (1)

Readregister 1

Readregister 2

Writeregister

Writedata

Readdata 2

Readdata 1

Registers

RegWrite (1)

Add

16

X0

1

0

1

IF/ID

ID/EX

EX/MEM

MEM/WB Control

M

WB

WB

101617

18

1020

117

118

110

X

111

X

99

110

-1

105 X

2

99

133

99

8

99

8

PC

Signextend

EX

M

WB

1

0

Page 32: Pipeline Performance

32

Cycle 6 (emptying)ID: add $13, $14, $0IF: ??? EX: or $16, $17, $18 MEM: and $9, $10, $11 WB: sub

$2, $4, $5

Readaddress

Instructionmemory

Instruction[31-0]

Address

Writedata

Data memory

Readdata

MemWrite (0)

MemRead (0)

1

0

MemToReg(0)

4

Shiftleft 2

Add

PCSrc

ALUSrc (0)

Result

ZeroALU

ALUOp (or)

XRegDst (1)

Readregister 1

Readregister 2

Writeregister

Writedata

Readdata 2

Readdata 1

Registers

RegWrite (1)

Add

13

X0

1

0

1

IF/ID

ID/EX

EX/MEM

MEM/WB Control

M

WB

WB

102014

0

???

114

0

117

X

118

X

1616

119

110

111 X

9

X

-1

-1

2

-1

2

PC

Signextend

1

0

EX

M

WB

Page 33: Pipeline Performance

33

Cycle 7ID: ???IF: ??? EX: add $13, $14, $0 MEM: or $16, $17, $18 WB: and

$9, $10, $11

Readaddress

Instructionmemory

Instruction[31-0]

Address

Writedata

Data memory

Readdata

MemWrite (0)

MemRead (0)

1

0

MemToReg(0)

4

Shiftleft 2

Add

PCSrc

ALUSrc (0)

Result

ZeroALU

ALUOp (add)

???RegDst (1)

Readregister 1

Readregister 2

Writeregister

Writedata

Readdata 2

Readdata 1

Registers

RegWrite (1)

Add

???

???0

1

0

1

IF/ID

ID/EX

EX/MEM

MEM/WB Control

M

WB

WB

???

???

???

???

114

X

0

X

1313

114

119

118 X

16

X

110

110

9

110

9

PC

Signextend

???

???

EX

M

WB

1

0

Page 34: Pipeline Performance

34

Cycle 8ID: ???IF: ??? EX: ??? MEM: add $13, $14, $0 WB: or $16,

$17, $18

Readaddress

Instructionmemory

Instruction[31-0]

Address

Writedata

Data memory

Readdata

MemWrite (0)

MemRead (0)

1

0

MemToReg(0)

4

Shiftleft 2

Add

PCSrc

ALUSrc (?)

Result

ZeroALU

ALUOp (???)

???RegDst (?)

Readregister 1

Readregister 2

Writeregister

Writedata

Readdata 2

Readdata 1

Registers

RegWrite (1)

Add

???

???0

1

0

1

IF/ID

ID/EX

EX/MEM

MEM/WB Control

M

WB

WB

???

???

???

??????

???

114

0 X

13

X

119

119

16

119

16

PC

Signextend

???

??? ???

???

???

???

???

1

0

EX

M

WB

Page 35: Pipeline Performance

35

Cycle 9ID: ???IF: ??? EX: ??? MEM: ??? WB: add

$13, $14, $0

Readaddress

Instructionmemory

Instruction[31-0]

Address

Writedata

Data memory

Readdata

MemWrite (?)

MemRead (?)

1

0

MemToReg(0)

4

Shiftleft 2

Add

PCSrc

ALUSrc (?)

Result

ZeroALU

ALUOp (???)

???RegDst (?)

Readregister 1

Readregister 2

Writeregister

Writedata

Readdata 2

Readdata 1

Registers

RegWrite (1)

Add

???

???0

1

0

1

IF/ID

ID/EX

EX/MEM

MEM/WB Control

M

WB

WB

???

???

???

???

??????

???

???

? X

???

X

114

114

13

114

13

PC

Signextend

???

???

???

???

???

???

1

0

EX

M

WB

Page 36: Pipeline Performance

36

That’s a lot of diagrams there

Compare the last few slides with the pipeline diagram above— You can see how instruction executions are overlapped— Each functional unit is used by a different instruction in each

cycle— The pipeline registers save control and data values generated

in previous clock cycles for later use— When the pipeline is full in clock cycle 5, all of the hardware

units are utilized. This is the ideal situation, and what makes pipelined processors so fast

Clock cycle1 2 3 4 5 6 7 8 9

lw $t0, 4($sp) IF ID EX MEM WBsub $v0, $a0, $a1

IF ID EX MEM

WB

and $t1, $t2, $t3 IF ID EX MEM

WB

or $s0, $s1, $s2

IF ID EX MEM WB

add $t5, $t6, $0 IF ID EX MEM WB

Page 37: Pipeline Performance

37

Note how everything goes left to right, except

Readaddress

Instructionmemory

Instruction[31-0]

Address

Writedata

Data memory

Readdata

MemWrite

MemRead

1

0

MemToReg

4

Shiftleft 2

Add

Signextend

ALUSrc

Result

ZeroALU

ALUOp

Instr [15 - 0]RegDst

Readregister 1

Readregister 2

Writeregister

Writedata

Readdata 2

Readdata 1

Registers

RegWrite

Add

Instr [15 - 11]

Instr [20 - 16]0

1

0

1

IF/ID ID/EX EX/MEM MEM/WB

1

0

PCSrc

PC