pipelining 國立清華大學資訊工程學系 黃婷婷教授. outline an overview of pipelining a...
TRANSCRIPT
Pipelining
國立清華大學資訊工程學系黃婷婷教授
Outline
An overview of pipelining A pipelined datapath Pipelined control Hazards: types of hazard Handling data hazards
Inserting NOP ( software) Forwarding, R-Type-use ( hardware) Stalls, load-use (hardware)
Handling branch hazards Exceptions Superscalar and dynamic pipelining
2
Laundry example:
小安,小白,小新,小德 each have one load ofclothes to wash, dry,and fold
Washer takes 30 minutes
Dryer takes 40 minutes
“Folder” takes 20 minutes
A B C D
Pipelining Is Natural!
3
Sequential laundry takes 6 hours for 4 loadsIf they learned pipelining, how long would it take?
A
B
C
D
30 40 20 30 40 20 30 40 20 30 40 20
6 PM 7 8 9 10 11 Midnight
Task
Order
Time
Sequential Laundry
4
Pipelined laundry takes 3.5 hours for 4 loads
A
B
C
D
6 PM 7 8 9 10 11 Midnight
Task
Order
Time
30 40 40 40 40 20
Pipelined Laundry: Start ASAP
5
Pipelining LessonsDoesn’t help latency of single task, but throughput of entire
Pipeline rate limited by slowest stage
Multiple tasks working at same time using different resources
Potential speedup = Number pipe stages
Unbalanced stage length; time to “fill” & “drain” the pipeline reduce speedup
Stall for dependences
A
B
C
D
6 PM 7 8 9
Task
Order
Time
30 40 40 40 40 20
6
Single cycle vs. Pipeline
Clk
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10
Load
Pipeline Implementation:
Clk
Single Cycle Implementation:
Load Store Waste
Ifetch Reg Exec Mem Wr
Ifetch Reg Exec Mem WrStore
Ifetch Reg Exec Mem WrR-type
Cycle 1 Cycle 2
7
Pipeline PerformanceSingle-cycle (Tc= 800ps)
Pipelined (Tc= 200ps)
8
Instr.
Order
Time (clock cycles)
Inst 0
Inst 1
Inst 2
Inst 4
Inst 3
AL
UIm Reg Dm Reg
AL
UIm Reg Dm Reg
AL
UIm Reg Dm Reg
AL
UIm Reg Dm Reg
AL
UIm Reg Dm Reg
Why Pipeline? Because the Resources Are There!
Single-cycle
Datapath
9
Outline
An overview of pipelining A pipelined datapath Pipelined control Hazards: types of hazard Handling data hazards
Inserting NOP ( software) Forwarding, R-Type-use ( hardware) Stalls, load-use (hardware)
Handling branch hazards Exceptions Superscalar and dynamic pipelining
10
Designing a Pipelined Processor
Starting with single cycle datapathSingle cycle control?
Partition datapath into stages:IF (instruction fetch), ID (instruction decode
and register file read), EX (execution or address calculation), MEM (data memory access), WB (write back)
Associate resources with stages Ensure that flows do not conflict, or figure
out how to resolve Assert control in appropriate stage
11
Multi-Execution Steps
Step nameAction for R-type
instructionsAction for memory-reference
instructions Action for branches Action for jumpsInstruction fetch IR = Memory[PC]
PC = PC + 4Instruction A = Reg [IR[25-21]]decode/register fetch B = Reg [IR[20-16]]Execution, address ALUOut = A op B ALUOut = A + sign-extend if (A ==B) then PC = PC [31-28] IIcomputation, branch/ (IR[15-0]) PC = PC+ (IR[25-0]<<2)jump completion sign-ext(IR[15-0])<<2
Memory access or R-type Reg [IR[15-11]] = Load: MDR = Memory[ALUOut]completion ALUOut or
Store: Memory [ALUOut] = B
Memory read completion Load: Reg[IR[20-16]] = MDR
But, use single-cycle datapath ...
12
Split Single-cycle Datapath
What to add to split the datapath into stages?
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
Instruction
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
ReaddataAddress
Datamemory
1
ALUresult
Mux
ALUZero
IF: Instruction fetch ID: Instruction decode/register file read
EX: Execute/address calculation
MEM: Memory access WB: Write back
FeedbackPath
13
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
Inst
ruct
ion
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
1
ALUresult
Mux
ALUZero
ID/EX
Datamemory
Address
Pipeline registers (latches)
Add Pipeline Registers
Use registers between stages to carry data and control
14
IF: Instruction Fetch Fetch the instruction from the Instruction
Memory ID: Instruction Decode
Registers fetch and instruction decode EX: Calculate the memory address MEM: Read the data from the Data Memory WB: Write the data back to the register file
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5
Ifetch Reg/Dec Exec Mem WrLoad
Consider load
15
Clock
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7
Ifetch Reg/Dec Exec Mem Wr1st lw
Ifetch Reg/Dec Exec Mem Wr2nd lw
Ifetch Reg/Dec Exec Mem Wr3rd lw
Pipelining load
5 functional units in the pipeline datapath are: Instruction Memory for the Ifetch stage Register File’s Read ports (busA and busB) for
the Reg/Dec stage ALU for the Exec stage Data Memory for the MEM stage Register File’s Write port (busW) for the WB
stage
16
IR = mem[PC]; PC = PC + 4
IF Stage of load
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
Inst
ruct
ion
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
1
ALUresult
Mux
ALUZero
ID/EX
Instruction fetch
lw
Address
Datamemory
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
Inst
ruct
ion
IF/ID EX/MEM
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
1
ALUresult
Mux
ALUZero
ID/EX MEM/WB
Instruction decode
lw
Address
Datamemory
IR, PC+4
17
ID Stage of load A = Reg[IR[25-21]]; B = Reg[IR[20-16]];
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
Inst
ruct
ion
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
1
ALUresult
Mux
ALUZero
ID/EX
Instruction fetch
lw
Address
Datamemory
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
Inst
ruct
ion
IF/ID EX/MEM
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
1
ALUresult
Mux
ALUZero
ID/EX MEM/WB
Instruction decode
lw
Address
Datamemory
18
EX Stage of load
ALUout = A + sign-ext(IR[15-0])
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
Inst
ruct
ion
IF/ID EX/MEM
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
1
ALUresult
Mux
ALUZero
ID/EX MEM/WB
Execution
lw
Address
Datamemory
19
MEM State of load
MDR = mem[ALUout]
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
Inst
ruct
ion
IF/ID EX/MEM
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
Datamemory
1
ALUresult
Mux
ALUZero
ID/EX MEM/WB
Memory
lw
Address
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
Inst
ruct
ion
IF/ID EX/MEM
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writedata
ReaddataData
memory
1
ALUresult
Mux
ALUZero
ID/EX MEM/WB
Write back
lw
Writeregister
Address
97108/Patterson Figure 06.15
20
WB Stage of load
Reg[IR[20-16]] = MDR
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
Inst
ruct
ion
IF/ID EX/MEM
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
Datamemory
1
ALUresult
Mux
ALUZero
ID/EX MEM/WB
Memory
lw
Address
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
Inst
ruct
ion
IF/ID EX/MEM
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writedata
ReaddataData
memory
1
ALUresult
Mux
ALUZero
ID/EX MEM/WB
Write back
lw
Writeregister
Address
97108/Patterson Figure 06.15
Who will supply
this address?
21
Cycle 1 Cycle 2 Cycle 3 Cycle 4
Ifetch Reg/Dec Exec WrR-type
The Four Stages of R-type
IF: fetch the instruction from the Instruction Memory
ID: registers fetch and instruction decode EX: ALU operates on the two register
operands WB: write ALU output back to the register file
22
We have a structural hazard: Two instructions try to write to the register file
at the same time! Only one write port
Clock
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9
Ifetch Reg/Dec Exec WrR-type
Ifetch Reg/Dec Exec WrR-type
Ifetch Reg/Dec Exec Mem WrLoad
Ifetch Reg/Dec Exec WrR-type
Ifetch Reg/Dec Exec WrR-type
Ops! We have a problem!
Pipelining R-type and load
23
Important Observation
Ifetch Reg/Dec Exec Mem WrLoad
1 2 3 4 5
Ifetch Reg/Dec Exec WrR-type
1 2 3 4
Each functional unit can only be used once per instruction
Each functional unit must be used at the same stage for all instructions: Load uses Register File’s write port during its
5th stage
R-type uses Register File’s write port during its 4th stage
Several ways to solve: forwarding, adding pipeline bubble, making instructions same length
24
Clock
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9
Ifetch Reg/Dec Mem WrR-type
Ifetch Reg/Dec Mem WrR-type
Ifetch Reg/Dec Exec Mem WrLoad
Ifetch Reg/Dec Mem WrR-type
Ifetch Reg/Dec Mem WrR-type
Ifetch Reg/Dec Exec WrR-type Mem
Exec
Exec
Exec
Exec
1 2 3 4 5
Solution: Delay R-type’s Write
Delay R-type’s register write by one cycle: R-type also use Reg File’s write port at Stage 5 MEM is a NOP stage: nothing is being done.
R-type also has 5 stages
25
Cycle 1 Cycle 2 Cycle 3 Cycle 4
Ifetch Reg/Dec Exec MemStore Wr
The Four Stages of store
IF: fetch the instruction from the Instruction Memory
ID: registers fetch and instruction decode EX: calculate the memory address MEM: write the data into the Data Memory
Add an extra stage: WB: NOP
26
IF: fetch the instruction from the Instruction Memory
ID: registers fetch and instruction decode EX:
compares the two register operandselect correct branch target addresslatch into PC
Add two extra stages: MEM: NOP WB: NOP
Cycle 1 Cycle 2 Cycle 3 Cycle 4
Ifetch Reg/Dec Exec MemBeq Wr
The Three Stages of beq
27
Graphically Representing Pipelines
Can help with answering questions like: How many cycles to execute this code? What is the ALU doing during cycle 4? Help understand datapaths
IM Reg DM Reg
IM Reg DM Reg
CC 1 CC 2 CC 3 CC 4 CC 5 CC 6
Time (in clock cycles)
lw $10, 20($1)
Programexecutionorder(in instructions)
sub $11, $2, $3
ALU
ALU
28
Example 1: Cycle 1
29
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
Inst
ruct
ion
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
1
ALUresult
Mux
ALUZero
ID/EX
Instruction decode
lw $10, 20($1)
Instruction fetch
sub $11, $2, $3
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
Inst
ruct
ion
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
1
ALUresult
Mux
ALUZero
ID/EX
Instruction fetch
lw $10, 20($1)
Address
Datamemory
Address
Datamemory
Clock 1
Clock 2
Example 1: Cycle 2
30
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
Inst
ruct
ion
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
1
ALUresult
Mux
ALUZero
ID/EX
Instruction decode
lw $10, 20($1)
Instruction fetch
sub $11, $2, $3
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
Inst
ruct
ion
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
1
ALUresult
Mux
ALUZero
ID/EX
Instruction fetch
lw $10, 20($1)
Address
Datamemory
Address
Datamemory
Clock 1
Clock 2
Instructionmemory
Address
4
0
Add Addresult
Shiftleft 2
Inst
ruct
ion
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
3216Sign
extend
Writeregister
Writedata
Memory
lw $10, 20($1)
Readdata
1
ALUresult
Mux
ALUZero
ID/EX
Execution
sub $11, $2, $3
Instructionmemory
Address
4
0
Add Addresult
Shiftleft 2
Inst
ruct
ion
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Writeregister
Writedata
Readdata
1
ALUresult
Mux
ALUZero
ID/EX
Execution
lw $10, 20($1)
Instruction decode
sub $11, $2, $3
3216Sign
extend
Address
Datamemory
Datamemory
Address
Clock 3
Clock 4
Example 1: Cycle 3
31
Instructionmemory
Address
4
0
Add Addresult
Shiftleft 2
Inst
ruct
ion
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
3216Sign
extend
Writeregister
Writedata
Memory
lw $10, 20($1)
Readdata
1
ALUresult
Mux
ALUZero
ID/EX
Execution
sub $11, $2, $3
Instructionmemory
Address
4
0
Add Addresult
Shiftleft 2
Inst
ruct
ion
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Writeregister
Writedata
Readdata
1
ALUresult
Mux
ALUZero
ID/EX
Execution
lw $10, 20($1)
Instruction decode
sub $11, $2, $3
3216Sign
extend
Address
Datamemory
Datamemory
Address
Clock 3
Clock 4
Example 1: Cycle 4
32
Instructionmemory
Address
4
32
0
Add Addresult
1
ALUresult
Zero
Shiftleft 2
Inst
ruct
ion
IF/ID EX/MEMID/EX MEM/WB
Write backMux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Mux
ALUReaddata
Writeregister
Writedata
lw $10, 20($1)
Instructionmemory
Address
4
32
0
Add Addresult
1
ALUresult
Zero
Shiftleft 2
Inst
ruct
ion
IF/ID EX/MEMID/EX MEM/WB
Write backMux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Mux
ALUReaddata
Writeregister
Writedata
sub $11, $2, $3
Memory
sub $11, $2, $3
Address
Datamemory
Address
Datamemory
Clock 6
Clock 5
Example 1: Cycle 5
33
Instructionmemory
Address
4
32
0
Add Addresult
1
ALUresult
Zero
Shiftleft 2
Inst
ruct
ion
IF/ID EX/MEMID/EX MEM/WB
Write backMux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Mux
ALUReaddata
Writeregister
Writedata
lw $10, 20($1)
Instructionmemory
Address
4
32
0
Add Addresult
1
ALUresult
Zero
Shiftleft 2
Inst
ruct
ion
IF/ID EX/MEMID/EX MEM/WB
Write backMux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Mux
ALUReaddata
Writeregister
Writedata
sub $11, $2, $3
Memory
sub $11, $2, $3
Address
Datamemory
Address
Datamemory
Clock 6
Clock 5
Example 1: Cycle 6
34
Outline
An overview of pipelining A pipelined datapath Pipelined control Hazards: types of hazard Handling data hazards
Inserting NOP ( software) Forwarding, R-Type-use ( hardware) Stalls, load-use (hardware)
Handling branch hazards Exceptions Superscalar and dynamic pipelining
35
Pipeline Control: Control Signals
PC
Instructionmemory
Address
Inst
ruct
ion
Instruction[20– 16]
MemtoReg
ALUOp
Branch
RegDst
ALUSrc
4
16 32Instruction[15– 0]
0
0Registers
Writeregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Signextend
Mux
1Write
data
Read
data Mux
1
ALUcontrol
RegWrite
MemRead
Instruction[15– 11]
6
IF/ID ID/EX EX/MEM MEM/WB
MemWrite
Address
Datamemory
PCSrc
Zero
AddAdd
result
Shiftleft 2
ALUresult
ALU
Zero
Add
0
1
Mux
0
1
Mux
36
Execution/Address Calculationstage control lines
Memory access stagecontrol lines
Write-back stagecontrol lines
RegDst
ALUOp1
ALUOp0
ALUSrc Branch
MemRead
MemWrite
Regwrite
Mem toReg
1 1 0 0 0 0 0 1 00 0 0 1 0 1 0 1 1X 0 0 1 0 0 1 0 XX 0 1 0 1 0 0 0 X
Fig. 4.22
Group Signals According to Stages
Can use control signals of single-cycle CPU
37
Pass control signals along just like the data Main control generates control signals during
ID
Data Stationary Control
Control
EX
M
WB
M
WB
WB
IF/ID ID/EX EX/MEM MEM/WB
Instruction
Fig. 4.5038
IF/ID
Register
ID/E
x Register
Ex/M
EM
Register
ME
M/W
B R
egister
ID EX MEM
ExtOp
ALUOp
RegDst
ALUSrc
Branch
MemWr
MemtoReg
RegWr
MainControl
ExtOp
ALUOp
RegDst
ALUSrc
MemtoReg
RegWr
MemtoReg
RegWr
MemtoReg
RegWr
Branch
MemWr
Branch
MemW
WB
Data Stationary Control (cont.)
Signals for EX (ExtOp, ALUSrc, ...) are used 1 cycle later
Signals for MEM (MemWr, Branch) are used 2 cycles later
Signals for WB (MemtoReg, MemWr) are used 3 cycles later
39
WB Stage of load
Reg[IR[20-16]] = MDR
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
Inst
ruct
ion
IF/ID EX/MEM
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
Datamemory
1
ALUresult
Mux
ALUZero
ID/EX MEM/WB
Memory
lw
Address
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
Inst
ruct
ion
IF/ID EX/MEM
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writedata
ReaddataData
memory
1
ALUresult
Mux
ALUZero
ID/EX MEM/WB
Write back
lw
Writeregister
Address
97108/Patterson Figure 06.15
Who will supply
this address?
40
Datapath with Control
PC
Instructionmemory
Inst
ruct
ion
Add
Instruction[20– 16]
Mem
toR
eg
ALUOp
Branch
RegDst
ALUSrc
4
16 32Instruction[15– 0]
0
0
Mux
0
1
Add Addresult
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Signextend
Mux
1
ALUresult
Zero
Writedata
Readdata
Mux
1
ALUcontrol
Shiftleft 2
RegW
rite
MemRead
Control
ALU
Instruction[15– 11]
6
EX
M
WB
M
WB
WBIF/ID
PCSrc
ID/EX
EX/MEM
MEM/WB
Mux
0
1
Mem
Write
AddressData
memory
Address
41
lw $10, 20($1)
sub $11, $2, $3
and $12, $4, $5
or $13, $6, $7
add $14, $8, $9
Let’s Try it Out
42
Example 2: Cycle 1
43
Instructionmemory
Instruction[20– 16]
Mem
toR
eg
ALUOp
Branch
RegDst
ALUSrc
4
Instruction[15– 0]
0
Mux
0
1
Add Addresult
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Signextend
Mux
1
ALUresult
Zero
ALUcontrol
Shiftleft 2
RegW
rite
MemRead
Control
ALU
Instruction[15– 11]
EX
M
WB
M
WB
WBIn
stru
ctio
n
IF/ID EX/MEMID/EX
ID: before<1> EX: before<2> MEM: before<3> WB: before<4>
MEM/WB
IF: lw $10, 20($1)
000
00
0000
000
00
000
0
00
00
0
0
0
Mux
0
1
Add
PC
0
Datamemory
Address
Writedata
Readdata
Mux
1
WB
EX
M
Instructionmemory
Mem
toR
eg
ALUOp
Branch
RegDst
ALUSrc
4
0
Mux
0
1
Add Addresult
Writeregister
Writedata
Mux
1
ALUresult
Zero
ALUcontrol
Shiftleft 2
RegW
rite
ALU
M
WB
WB
Inst
ruct
ion
IF/ID EX/MEMID/EX
ID: lw $10, 20($1) EX: before<1> MEM: before<2> WB: before<3>
MEM/WB
IF: sub $11, $2, $3
010
11
0001
000
00
000
0
00
00
0
0
0
Mux
0
1
Add
PC
0Writedata
Readdata
Mux
1
lwControl
Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
X
10
20
X
1
Instruction[20– 16]
Instruction[15– 0] Sign
extend
Instruction[15– 11]
20
$X
$1
10
X
Mem
Write
MemRead
Mem
Writ
e
Datamemory
Address
Address
Address
Clock 2
Clock 1
Example 2: Cycle 2
44
Instructionmemory
Instruction[20– 16]
Mem
toR
eg
ALUOp
Branch
RegDst
ALUSrc
4
Instruction[15– 0]
0
Mux
0
1
Add Addresult
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Signextend
Mux
1
ALUresult
Zero
ALUcontrol
Shiftleft 2
RegW
rite
MemRead
Control
ALU
Instruction[15– 11]
EX
M
WB
M
WB
WB
Inst
ruct
ion
IF/ID EX/MEMID/EX
ID: before<1> EX: before<2> MEM: before<3> WB: before<4>
MEM/WB
IF: lw $10, 20($1)
000
00
0000
000
00
000
0
00
00
0
0
0
Mux
0
1
Add
PC
0
Datamemory
Address
Writedata
Readdata
Mux
1
WB
EX
M
Instructionmemory
Mem
toR
eg
ALUOp
Branch
RegDst
ALUSrc
4
0
Mux
0
1
Add Addresult
Writeregister
Writedata
Mux
1
ALUresult
Zero
ALUcontrol
Shiftleft 2
RegW
rite
ALU
M
WB
WBIn
stru
ctio
n
IF/ID EX/MEMID/EX
ID: lw $10, 20($1) EX: before<1> MEM: before<2> WB: before<3>
MEM/WB
IF: sub $11, $2, $3
010
11
0001
000
00
000
0
00
00
0
0
0
Mux
0
1
Add
PC
0Writedata
Readdata
Mux
1
lwControl
Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
X
10
20
X
1
Instruction[20– 16]
Instruction[15– 0] Sign
extend
Instruction[15– 11]
20
$X
$1
10
X
Mem
Write
MemRead
Mem
Writ
e
Datamemory
Address
Address
Address
Clock 2
Clock 1
Instructionmemory
Address
Instruction[20– 16]
Mem
toR
eg
Branch
ALUSrc
4
Instruction[15– 0]
0
1
Add Addresult
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
ALUresult
Shiftleft 2
RegW
rite
MemRead
Control
ALU
Instruction[15– 11]
EX
M
WB
WBIn
stru
ctio
n
IF/ID EX/MEMID/EX
ID: sub $11, $2, $3 EX: lw $10, . . . MEM: before<1> WB: before<2>
MEM/WB
IF: and $12, $4, $5
000
10
1100
010
11
000
1
00
00
0
0
0
Mux
0
1
Add
PC
0Writedata
Readdata
Mux
1
WB
EX
M
Instructionmemory
Address
Mem
toR
eg
ALUOp
Branch
RegDst
ALUSrc
4
0
0
1
Add Addresult
Writeregister
Writedata 1
ALUresult
ALUcontrol
Shiftleft 2
RegW
rite
M
WB
Inst
ruct
ion
IF/ID EX/MEMID/EX
ID: and $12, $2, $3 EX: sub $11, . . . MEM: lw $10, . . . WB: before<1>
MEM/WB
IF: or $13, $6, $7
000
10
1100
000
10
101
0
11
10
0
0
0
Mux
0
1
Add
PC
0Writedata
Mux
1
andControl
Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
12
X
X
5
4
Instruction[20– 16]
Instruction[15– 0]
Instruction[15– 11]
X
$5
$4
X
12
Mem
Write
MemRead
Mem
Writ
e
sub
11
X
X
3
2
X
$3
$2
X
11
$1
20
10
Mux
0
Mux
1
ALUOp
RegDst
ALUcontrol
M
WB
$3
$2
11
Mux
Mux
ALUAddress Read
dataData
memory
10
WB
Zero
Zero
Signextend
Signextend
Datamemory
Address
Clock 3
Clock 4
Example 2: Cycle 3
45
Instructionmemory
Address
Instruction[20– 16]
Mem
toR
eg
Branch
ALUSrc
4
Instruction[15– 0]
0
1
Add Addresult
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
ALUresult
Shiftleft 2
RegW
rite
MemRead
Control
ALU
Instruction[15– 11]
EX
M
WB
WB
Inst
ruct
ion
IF/ID EX/MEMID/EX
ID: sub $11, $2, $3 EX: lw $10, . . . MEM: before<1> WB: before<2>
MEM/WB
IF: and $12, $4, $5
000
10
1100
010
11
000
1
00
00
0
0
0
Mux
0
1
Add
PC
0Writedata
Readdata
Mux
1
WB
EX
M
Instructionmemory
Address
Mem
toR
eg
ALUOp
Branch
RegDst
ALUSrc
4
0
0
1
Add Addresult
Writeregister
Writedata 1
ALUresult
ALUcontrol
Shiftleft 2
RegW
rite
M
WB
Inst
ruct
ion
IF/ID EX/MEMID/EX
ID: and $12, $2, $3 EX: sub $11, . . . MEM: lw $10, . . . WB: before<1>
MEM/WB
IF: or $13, $6, $7
000
10
1100
000
10
101
0
11
10
0
0
0
Mux
0
1
Add
PC
0Writedata
Mux
1
andControl
Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
12
X
X
5
4
Instruction[20– 16]
Instruction[15– 0]
Instruction[15– 11]
X
$5
$4
X
12
Mem
Write
MemRead
Mem
Writ
e
sub
11
X
X
3
2
X
$3
$2
X
11
$1
20
10
Mux
0
Mux
1
ALUOp
RegDst
ALUcontrol
M
WB
$3
$2
11
Mux
Mux
ALUAddress Read
dataData
memory
10
WB
Zero
Zero
Signextend
Signextend
Datamemory
Address
Clock 3
Clock 4
Example 2: Cycle 4
46
Example 2: Cycle 5
47
Instructionmemory
Address
Instruction[20– 16]
Branch
ALUSrc
4
Instruction[15– 0]
0
1
Add Addresult
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
ALUresult
Shiftleft 2
RegW
rite
MemRead
Control
ALU
Instruction[15– 11]
EX
M
WB
Inst
ruct
ion
IF/ID EX/MEMID/EX
ID: or $13, $6, $7 EX: and $12, . . . MEM: sub $11, . . . WB: lw $10, . . .
MEM/WB
IF: add $14, $8, $9
000
10
1100
000
10
101
0
10
00
0
Mux
0
1
Add
PC
0Writedata
Readdata
Mux
1
WB
EX
M
Instructionmemory
Address
Mem
toR
eg
ALUOp
Branch
RegDst
ALUSrc
4
0
0
1
Add Addresult
1
ALUresult
ALUcontrol
Shiftleft 2
RegW
rite
M
WB
Inst
ruct
ion
IF/ID EX/MEMID/EX
ID: add $14, $8, $9 EX: or $13, . . . MEM: and $12, . . . WB: sub $11, . . .
MEM/WB
IF: after<1>
000
10
1100
000
10
101
0
10
00
0
1
0
Mux
0
1
Add
PC
0Writedata
Mux
1
addControl
Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
14
X
X
9
8
Instruction[20– 16]
Instruction[15– 0]
Instruction[15– 11]
X
$9
$8
X
14
Mem
Write
MemRead
Mem
Writ
e
or
13
X
X
7
6
X
$7
$6
X
13
$4
Mux
0
Mux
1
ALUOp
RegDst
ALUcontrol
M
WB
$7
$6
13
Mux
Mux
ALUReaddata
12
WB
11 10
10$5
12
WB
Mem
toR
eg
1
1
11
11
Writeregister
Writedata
Zero
Zero
Datamemory
Address
Datamemory
Address
Signextend
Signextend
Clock 5
Clock 6
Example 2: Cycle 6
48
Instructionmemory
Address
Instruction[20– 16]
Branch
ALUSrc
4
Instruction[15– 0]
0
1
Add Addresult
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
ALUresult
Shiftleft 2
RegW
rite
MemRead
Control
ALU
Instruction[15– 11]
EX
M
WB
Inst
ruct
ion
IF/ID EX/MEMID/EX
ID: or $13, $6, $7 EX: and $12, . . . MEM: sub $11, . . . WB: lw $10, . . .
MEM/WB
IF: add $14, $8, $9
000
10
1100
000
10
101
0
10
00
0
Mux
0
1
Add
PC
0Writedata
Readdata
Mux
1
WB
EX
M
Instructionmemory
Address
Mem
toR
eg
ALUOp
Branch
RegDst
ALUSrc
4
0
0
1
Add Addresult
1
ALUresult
ALUcontrol
Shiftleft 2
RegW
rite
M
WB
Inst
ruct
ion
IF/ID EX/MEMID/EX
ID: add $14, $8, $9 EX: or $13, . . . MEM: and $12, . . . WB: sub $11, . . .
MEM/WB
IF: after<1>
000
10
1100
000
10
101
0
10
00
0
1
0
Mux
0
1
Add
PC
0Writedata
Mux
1
addControl
Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
14
X
X
9
8
Instruction[20– 16]
Instruction[15– 0]
Instruction[15– 11]
X
$9
$8
X
14
Mem
Write
MemRead
Mem
Writ
e
or
13
X
X
7
6
X
$7
$6
X
13
$4
Mux
0
Mux
1
ALUOp
RegDst
ALUcontrol
M
WB
$7
$6
13
Mux
Mux
ALUReaddata
12
WB
11 10
10$5
12
WB
Mem
toR
eg
1
1
11
11
Writeregister
Writedata
Zero
Zero
Datamemory
Address
Datamemory
Address
Signextend
Signextend
Clock 5
Clock 6
Example 2: Cycle 7
49
Fig. 6.34
Instructionmemory
Address
Instruction[20– 16]
Branch
ALUSrc
4
Instruction[15– 0]
0
1
Add Addresult
RegistersWriteregister
Writedata
ALUresult
Shiftleft 2
RegW
rite
MemRead
Control
ALU
Instruction[15– 11]
Signextend
EX
M
WB
Inst
ruct
ion
IF/ID EX/MEMID/EX
ID: after<1> EX: add $14, . . . MEM: or $13, . . . WB: and $12, . . .
MEM/WB
IF: after<2>
000
00
0000
000
10
101
0
10
00
0
Mux
0
1
Add
PC
0Writedata
Readdata
Mux
1
WB
EX
M
Instructionmemory
Address
Mem
toR
eg
ALUOp
Branch
RegDst
ALUSrc
4
0
0
1
Add Addresult
1
ALUresult
Zero
ALUcontrol
Shiftleft 2
RegW
rite
M
WB
Inst
ruct
ion
IF/ID EX/MEMID/EX
ID: after<2> EX: after<1> MEM: add $14, . . . WB: or $13, . . .
MEM/WB
IF: after<3>
000
00
0000
000
00
000
0
10
00
0
1
0
Mux
0
1
Add
PC
0Writedata
Mux
1
Control
Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Instruction[20– 16]
Instruction[15– 0] Sign
extend
Instruction[15– 11]
Mem
Write
MemRead
Mem
Writ
e
$8
Mux
0
Mux
1
ALUOp
RegDst
ALUcontrol
M
WB
Mux
Mux
ALUReaddata
14
WB
13 12
12$9
14
WB
Mem
toR
eg
1
0
13
13
Writeregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2 Zero
Datamemory
Address
Datamemory
Address
Clock 7
Clock 8
Example 2: Cycle 8
50
Fig. 6.34
Instructionmemory
Address
Instruction[20– 16]
Branch
ALUSrc
4
Instruction[15– 0]
0
1
Add Addresult
RegistersWriteregister
Writedata
ALUresult
Shiftleft 2
RegW
rite
MemRead
Control
ALU
Instruction[15– 11]
Signextend
EX
M
WB
Inst
ruct
ion
IF/ID EX/MEMID/EX
ID: after<1> EX: add $14, . . . MEM: or $13, . . . WB: and $12, . . .
MEM/WB
IF: after<2>
000
00
0000
000
10
101
0
10
00
0
Mux
0
1
Add
PC
0Writedata
Readdata
Mux
1
WB
EX
M
Instructionmemory
Address
Mem
toR
eg
ALUOp
Branch
RegDst
ALUSrc
4
0
0
1
Add Addresult
1
ALUresult
Zero
ALUcontrol
Shiftleft 2
RegW
rite
M
WB
Inst
ruct
ion
IF/ID EX/MEMID/EX
ID: after<2> EX: after<1> MEM: add $14, . . . WB: or $13, . . .
MEM/WB
IF: after<3>
000
00
0000
000
00
000
0
10
00
0
1
0
Mux
0
1
Add
PC
0Writedata
Mux
1
Control
Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Instruction[20– 16]
Instruction[15– 0] Sign
extend
Instruction[15– 11]
Mem
Write
MemRead
Mem
Writ
e
$8
Mux
0
Mux
1
ALUOp
RegDst
ALUcontrol
M
WB
Mux
Mux
ALUReaddata
14
WB
13 12
12$9
14
WB
Mem
toR
eg
1
0
13
13
Writeregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2 Zero
Datamemory
Address
Datamemory
Address
Clock 7
Clock 8
WB
EX
M
Instructionmemory
Address
Mem
toR
eg
ALUOp
Branch
RegDst
ALUSrc
4
0
0
1
Add Addresult
1
ALUresult
Zero
ALUcontrol
Shiftleft 2
Reg
Writ
e
M
WB
Inst
ruct
ion
IF/ID EX/MEMID/EX
ID: after<3> EX: after<2> MEM: after<1> WB: add $14, . . .
MEM/WB
IF: after<4>
000
00
0000
000
00
000
0
00
00
0
1
0
Mux
0
1
Add
PC
0Writedata
Mux
1
Control
Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Instruction[20– 16]
Instruction[15– 0] Sign
extend
Instruction[15– 11]
MemRead
Mem
Write
Mux
Mux
ALUReaddata
WB
14
14
Writeregister
Writedata
Datamemory
Address
Clock 9
Example 2: Cycle 9
51
Summary of Pipeline Basics
Pipelining is a fundamental concept Multiple steps using distinct resources Utilize capabilities of datapath by pipelined
instruction processing Start next instruction while working on the
current one Limited by length of longest stage (plus
fill/flush) Need to detect and resolve hazards
What makes it easy in MIPS? All instructions are of the same length Just a few instruction formats Memory operands only in loads and stores
What makes pipelining hard? hazards
52
Outline
An overview of pipelining A pipelined datapath Pipelined control Hazards: types of hazard Handling data hazards
Inserting NOP ( software) Forwarding, R-Type-use ( hardware) Stalls, load-use ( hardware)
Handling branch hazards Exceptions Superscalar and dynamic pipelining
53
Pipeline Hazards
Pipeline Hazards: Structural hazards: attempt to use the same
resource in two different ways at the same time Ex.: combined washer/dryer or folder busy doing
something else (watching TV) Data hazards: attempt to use item before ready
Instruction depends on result of prior instruction still in the pipeline
Control hazards: attempt to make decision before condition is evaluated
Ex.: wash football uniforms and need to see result of previous load to get proper detergent level
Branch instructions Can always resolve hazards by waiting
pipeline control must detect the hazard take action (or delay action) to resolve hazards
54
Mem
Instr.
Order
Time
Load
Instr 1
Instr 2
Instr 3
Instr 4A
LUMem Reg Mem Reg
AL
UMem Reg Mem Reg
AL
UMem Reg Mem RegA
LUReg Mem Reg
AL
UMem Reg Mem Reg
Use 2 memory: data memory and instruction memory
Structural Hazard: Single Memory
55
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
Inst
ruc
tion
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
1
ALUresult
Mux
ALUZero
ID/EX
Datamemory
Address
Feedback Path: Data Hazard
56
Data Hazards
IM Reg
IM Reg
CC 1 CC 2 CC 3 CC 4 CC 5 CC 6
Time (in clock cycles)
sub $2, $1, $3
Programexecutionorder(in instructions)
and $12, $2, $5
IM Reg DM Reg
IM DM Reg
IM DM Reg
CC 7 CC 8 CC 9
10 10 10 10 10/ -2 0 -20 -2 0 -20 -2 0
or $13, $6, $2
add $14, $2, $2
sw $15, 100($2)
Value of register $2:
DM Reg
Reg
Reg
Reg
DM
57
Types of Data Hazards
Three types: (inst. i1 followed by inst. i2) RAW (read after write):
i2 tries to read operand before i1 writes it WAR (write after read):
i2 tries to write operand before i1 reads it Gets wrong operand, e.g., autoincrement addr. Can’t happen in MIPS 5-stage pipeline because:
All instructions take 5 stages, and reads are always in stage 2, and writes are always in stage 5
WAW (write after write): i2 tries to write operand before i1 writes it Leaves wrong result ( i1’s not i2’s); occur only
in pipelines that write in more than one stage Can’t happen in MIPS 5-stage pipeline because:
All instructions take 5 stages, and writes are always in stage 5
58
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
Inst
ruc
tion
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
1
ALUresult
Mux
ALUZero
ID/EX
Datamemory
Address
Feedback Path: Control Hazard
59
Outline
An overview of pipelining A pipelined datapath Pipelined control Hazards: types of hazard Handling data hazards
Inserting NOP ( software) Forwarding, R-Type-use ( hardware) Stalls, load-use ( hardware)
Handling branch hazards Exceptions Superscalar and dynamic pipelining
60
Handling Data Hazards
Use simple, fixed designs Eliminate WAR by always fetching operands
early (ID) in pipeline Eliminate WAW by doing all write backs in
order (last stage, static) These features have a lot to do with ISA design
Internal forwarding in register file: Write in first half of clock and read in second
half Read delivers what is written, resolve hazard
between sub and add Detect and resolve remaining ones
Compiler inserts NOP (software solution) Forward (hardware solution) Stall (hardware solution)
61
Software Solution
Have compiler guarantee no hazards Where do we insert the NOPs?
sub $2, $1, $3and $12, $2, $5or $13, $6, $2add $14, $2, $2sw $15, 100($2)
62
Data Hazards
IM Reg
IM Reg
CC 1 CC 2 CC 3 CC 4 CC 5 CC 6
Time (in clock cycles)
sub $2, $1, $3
Programexecutionorder(in instructions)
and $12, $2, $5
IM Reg DM Reg
IM DM Reg
IM DM Reg
CC 7 CC 8 CC 9
10 10 10 10 10/ -2 0 -20 -2 0 -20 -2 0
or $13, $6, $2
add $14, $2, $2
sw $15, 100($2)
Value of register $2:
DM Reg
Reg
Reg
Reg
DM
Insert two nops
63
This really slows us down!
Outline
An overview of pipelining A pipelined datapath Pipelined control Hazards: types of hazard Handling data hazards
Inserting NOP ( software) Forwarding, R-Type-use ( hardware) Stalls, load-use ( hardware)
Handling branch hazards Exceptions Superscalar and dynamic pipelining
64
Data Hazards : Forwarding
IM Reg
IM Reg
CC 1 CC 2 CC 3 CC 4 CC 5 CC 6
Time (in clock cycles)
sub $2, $1, $3
Programexecutionorder(in instructions)
and $12, $2, $5
IM Reg DM Reg
IM DM Reg
IM DM Reg
CC 7 CC 8 CC 9
10 10 10 10 10/ -2
0 -20 -20 -20 -20
or $13, $6, $2
add $14, $2, $2
sw $15, 100($2)
Value of register $2:
DM Reg
Reg
Reg
Reg
DM
65
Datapath with Forwarding
66
Control: Detecting Data Hazards
Hazard conditions: 1a. EX/MEM.RegisterRd = ID/EX.RegisterRs 1b. EX/MEM.RegisterRd = ID/EX.RegisterRt 2a. MEM/WB.RegisterRd = ID/EX.RegisterRs 2b. MEM/WB.RegisterRd = ID/EX.RegisterRt
Two optimizations: Don’t forward if instruction does not write
register=> check if RegWrite is asserted
Don’t forward if destination register is $0=> check if RegisterRd = 0
67
Detecting Data Hazards (cont.)
Hazard conditions using control signals: At MEM stage:
EX/MEM.RegWrite and (EX/MEM.RegRd0) and (EX/MEM.RegRd=ID/EX.RegRs)
At WB stage:MEM/WB.RegWrite and (MEM/WB.RegRd0) and (MEM/WB.RegRd=ID/EX.RegRs)
(replace ID/EX.RegRt for ID/EX.RegRs for the other two conditions)
68
Datapath with Forwarding
69
Forwarding Logic
Forwarding: input to ALU from any pipe reg. Add multiplexors to ALU input Control forwarding in EX => carry Rs in ID/EX
Control signals for forwarding: If both WB and MEM forward, e.g., add $1,$1,$2; add $1,$1,$3; add $1,$1,$4; => let MEM forward
MEM hazard: if (EX/MEM.RegWrite and (EX/MEM.RegRd0)
and (EX/MEM.RegRd=ID/EX.RegRs)) ForwardA=10
WB hazard: if (MEM/WB.RegWrite and (MEM/WB.RegRd0)
and (EX/MEM.RegRd ID/EX.Reg.Rs)and (MEM/WB.RegRd=ID/EX.RegRs))
ForwardA=01(ID/EX.RegRt<->ID/EX.RegRs, ForwardB<-> ForwardA)
70
Example 3: Cycle 3
71
PCInstruction
memory
Registers
Mux
Mux
Mux
EX
M
WB
WB
Datamemory
Mux
Forwardingunit
Inst
ruct
ion
IF/ID
and $4, $2, $5 sub $2, $1, $3
ID/EX
before<1>
EX/MEM
before<2>
MEM/WB
or $4, $4, $2
Clock 3
2
5
10 10
$2
$5
5
2
4
$1
$3
3
1
2
Control
ALU
PCInstruction
memory
Registers
Mux
Mux
Mux
EX
M
WB
M
WB
Datamemory
Mux
Forwardingunit
Inst
ruct
ion
IF/ID
or $4, $4, $2 and $4, $2, $5
ID/EX
sub $2, . . .
EX/MEM
before<1>
MEM/WB
add $9, $4, $2
Clock 4
4
6
10 10
$4
$2
6
2
4
$2
$5
5
2
4
Control
ALU
10
2
WB
M
WB
Example 3: Cycle 4
72
Fig. 6.41
PCInstruction
memory
Registers
Mux
Mux
Mux
EX
M
WB
WB
Datamemory
Mux
Forwardingunit
Inst
ruct
ion
IF/ID
and $4, $2, $5 sub $2, $1, $3
ID/EX
before<1>
EX/MEM
before<2>
MEM/WB
or $4, $4, $2
Clock 3
2
5
10 10
$2
$5
5
2
4
$1
$3
3
1
2
Control
ALU
PCInstruction
memory
Registers
Mux
Mux
Mux
EX
M
WB
M
WB
Datamemory
Mux
Forwardingunit
Inst
ruct
ion
IF/ID
or $4, $4, $2 and $4, $2, $5
ID/EX
sub $2, . . .
EX/MEM
before<1>
MEM/WB
add $9, $4, $2
Clock 4
4
6
10 10
$4
$2
6
2
4
$2
$5
5
2
4
Control
ALU
10
2
WB
M
WB
PCInstruction
memory
Registers
Mux
Mux
Mux
EX
M
WB
M
WB
Datamemory
Mux
Forwardingunit
Inst
ruct
ion
IF/ID
add $9, $4, $2 or $4, $4, $2
ID/EX
and $4, . . .
EX/MEM
sub $2, . . .
MEM/WB
after<1>
Clock 5
4
2
10 10
$4
$2
2
4
9
$4
$2
4
2
24
Control
ALU
10
WB
2
1
PCInstruction
memory
Mux
Mux
Mux
EX
M
WB
M
WB
Datamemory
Mux
Forwardingunit
after<1>after<2> add $9, $4, $2 or $4, . . .
EX/MEM
and $4, . . .
MEM/WB
Clock 6
10
$4
$2
2
4
9
ALU
10
4
4
WB
4
1
Registers
Inst
ruct
ion
IF/ID
ID/EX
4
Control
Example 3: Cycle 5
73
Example 3: Cycle 6
74
Fig. 6.42
PCInstruction
memory
Registers
Mux
Mux
Mux
EX
M
WB
M
WB
Datamemory
Mux
Forwardingunit
Inst
ruct
ion
IF/ID
add $9, $4, $2 or $4, $4, $2
ID/EX
and $4, . . .
EX/MEM
sub $2, . . .
MEM/WB
after<1>
Clock 5
4
2
10 10
$4
$2
2
4
9
$4
$2
4
2
24
Control
ALU
10
WB
2
1
PCInstruction
memory
Mux
Mux
Mux
EX
M
WB
M
WB
Datamemory
Mux
Forwardingunit
after<1>after<2> add $9, $4, $2 or $4, . . .
EX/MEM
and $4, . . .
MEM/WB
Clock 6
10
$4
$2
2
4
9
ALU
10
4
4
WB
4
1
Registers
Inst
ruct
ion
IF/ID
ID/EX
4
Control
Outline
An overview of pipelining A pipelined datapath Pipelined control Hazards: types of hazard Handling data hazards
Inserting NOP ( software) Forwarding, R-Type-use ( hardware ) Stalls, load-use ( hardware)
Handling branch hazards Exceptions Superscalar and dynamic pipelining
75
lw can still cause a hazard: if is followed by an instruction to read the
loaded reg.
Reg
IM
Reg
Reg
IM
CC 1 CC 2 CC 3 CC 4 CC 5 CC 6
Time (in clock cycles)
lw $2, 20($1)
Programexecutionorder(in instructions)
and $4, $2, $5
IM Reg DM Reg
IM DM Reg
IM DM Reg
CC 7 CC 8 CC 9
or $8, $2, $6
add $9, $4, $2
slt $1, $6, $7
DM Reg
Reg
Reg
DM
Can't Always Forward
Use stalling or compiler to resolve
76
Stalling
Stall pipeline by keeping instructions in same stage and inserting an NOP instead
lw $2, 20($1)
Programexecutionorder(in instructions)
and $4, $2, $5
or $8, $2, $6
add $9, $4, $2
slt $1, $6, $7
Reg
IM
Reg
Reg
IM DM
CC 1 CC 2 CC 3 CC 4 CC 5 CC 6Time (in clock cycles)
IM Reg DM RegIM
IM DM Reg
IM DM Reg
CC 7 CC 8 CC 9 CC 10
DM Reg
RegReg
Reg
bubble
77
PCInstruction
memory
Registers
Mux
Mux
Mux
Control
ALU
EX
M
WB
M
WB
WB
ID/EX
EX/MEM
MEM/WB
Datamemory
Mux
Hazarddetection
unit
Forwardingunit
0
Mux
IF/ID
Inst
ruct
ion
ID/EX.MemRead
IF/ID
Write
PC
Write
ID/EX.RegisterRt
IF/ID.RegisterRd
IF/ID.RegisterRt
IF/ID.RegisterRt
IF/ID.RegisterRs
RtRs
Rd
Rt EX/MEM.RegisterRd
MEM/WB.RegisterRd
Datapath with Stalling Unit
Forwarding controls ALU inputs, hazard detection controls PC, IF/ID, control signals
78
Control: Handling Stalls
Hazard detection unit in ID to insert stall between a load instruction and its use: if (ID/EX.MemRead and ((ID/EX.RegisterRt = IF/ID.RegisterRs) or (ID/EX.RegisterRt = IF/ID.registerRt)) stall the pipeline for one cycle(ID/EX.MemRead=1 indicates a load instruction)
How to stall? Stall instruction in IF and ID: not change PC
and IF/ID=> the stages re-execute the instructions
What to move into EX: insert an NOP by changing EX, MEM, WB control fields of ID/EX pipeline register to 0
as control signals propagate, all control signals to EX, MEM, WB are deasserted and no registers or memories are written
79
Example 4: Cycle 2
80Hazard
detectionunit
0
MuxIF
/ID
Write
PC
Write
ID/EX.RegisterRt
lw $2, 20($1)
PCInstruction
memory
Registers
Mux
Mux
Mux
EX
M
WB
WB
Datamemory
Mux
Forwardingunit
Inst
ruct
ion
IF/ID
and $4, $2, $5
ID/EX
before<1>
EX/MEM
before<2>
MEM/WB
or $4, $4, $2
Clock 3
2
5
2
500 11
$2
$5
5
2
4
$1
$X
X
1
2
Control
ALU
M
WB
Hazarddetection
unit
0
MuxIF
/ID
Write
PC
Write
ID/EX.RegisterRt
ID/EX.MemRead
ID/EX.MemRead
M
WB
$1
$X
X
1
2
before<3>
PCInstruction
memory
Registers
Mux
Mux
Mux
EX WB
Datamemory
Mux
Forwardingunit
Inst
ruct
ion
IF/ID
ID/EX
EX/MEM
MEM/WB
and $4, $2, $5 lw $2, 20($1) before<1> before<2>
Clock 2
1
1
X
X11
Control
ALU
M
WB
Example 4: Cycle 3
81
Hazarddetection
unit
0
MuxIF/ID
Write
PC
Write
ID/EX.RegisterRt
lw $2, 20($1)
PCInstruction
memory
Registers
Mux
Mux
Mux
EX
M
WB
WB
Datamemory
Mux
Forwardingunit
Inst
ruct
ion
IF/ID
and $4, $2, $5
ID/EX
before<1>
EX/MEM
before<2>
MEM/WB
or $4, $4, $2
Clock 3
2
5
2
500 11
$2
$5
5
2
4
$1
$X
X
1
2
Control
ALU
M
WB
Hazarddetection
unit
0
MuxIF
/ID
Write
PC
Write
ID/EX.RegisterRt
ID/EX.MemRead
ID/EX.MemRead
M
WB
$1
$X
X
1
2
before<3>
PCInstruction
memory
Registers
Mux
Mux
Mux
EX WB
Datamemory
Mux
Forwardingunit
Inst
ruct
ion
IF/ID
ID/EX
EX/MEM
MEM/WB
and $4, $2, $5 lw $2, 20($1) before<1> before<2>
Clock 2
1
1
X
X11
Control
ALU
M
WB
Hazarddetection
unit
0
MuxIF
/IDW
rite
PC
Writ
e
ID/EX.RegisterRt
$2
$5
5
2
2
2
4
WB
Hazarddetection
unit
0
MuxIF
/IDW
rite
PC
Writ
e
ID/EX.RegisterRt
PCInstruction
memory
Registers
Mux
Mux
Mux
EX
M
WB
Datamemory
Mux
Inst
ruct
ion
IF/ID
and $4, $2, $5 bubble
ID/EX
lw $2, . . .
EX/MEM
before<1>
MEM/WB
Clock 4
2
2
5
510
11
00
$2
$5
5
2
4
Control
ALU
M
WB
bubble lw $2, . . .
PCInstruction
memory
Registers
Mux
Mux
Mux
EX
M
WB
M
WB
Datamemory
Mux
Forwardingunit
Forwardingunit
Inst
ruct
ion
IF/ID
and $4, $2, $5
ID/EX
EX/MEM
MEM/WB
add $9, $4, $2
Clock 5
2
210 10
11
$4
$2
2
4
4
4
2
4
$2
$5
5
2
4
Control
ALU
0
WB
ID/EX.MemRead
ID/EX.MemRead
or $4, $4, $2
or $4, $4, $2
Example 4: Cycle 4
82
Hazarddetection
unit
0
MuxIF
/IDW
rite
PC
Writ
e
ID/EX.RegisterRt
$2
$5
5
2
2
2
4
WB
Hazarddetection
unit
0
MuxIF
/IDW
rite
PC
Writ
eID/EX.RegisterRt
PCInstruction
memory
Registers
Mux
Mux
Mux
EX
M
WB
Datamemory
Mux
Inst
ruct
ion
IF/ID
and $4, $2, $5 bubble
ID/EX
lw $2, . . .
EX/MEM
before<1>
MEM/WB
Clock 4
2
2
5
510
11
00
$2
$5
5
2
4
Control
ALU
M
WB
bubble lw $2, . . .
PCInstruction
memory
Registers
Mux
Mux
Mux
EX
M
WB
M
WB
Datamemory
Mux
Forwardingunit
Forwardingunit
Inst
ruct
ion
IF/ID
and $4, $2, $5
ID/EX
EX/MEM
MEM/WB
add $9, $4, $2
Clock 5
2
210 10
11
$4
$2
2
4
4
4
2
4
$2
$5
5
2
4
Control
ALU
0
WB
ID/EX.MemRead
ID/EX.MemRead
or $4, $4, $2
or $4, $4, $2
Example 4: Cycle 5
83
Outline
An overview of pipelining A pipelined datapath Pipelined control Hazards: types of hazard Handling data hazards
Inserting NOP ( software) Forwarding, R-Type-use ( hardware) Stalls, load-use ( hardware)
Handling branch hazards Exceptions Superscalar and dynamic pipelining
84
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
Inst
ruc
tion
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
1
ALUresult
Mux
ALUZero
ID/EX
Datamemory
Address
Feedback Path
85
Pipeline Datapath with Control Signals
PC
Instructionmemory
Address
Inst
ruct
ion
Instruction[20– 16]
MemtoReg
ALUOp
Branch
RegDst
ALUSrc
4
16 32Instruction[15– 0]
0
0Registers
Writeregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Signextend
Mux
1Write
data
Read
data Mux
1
ALUcontrol
RegWrite
MemRead
Instruction[15– 11]
6
IF/ID ID/EX EX/MEM MEM/WB
MemWrite
Address
Datamemory
PCSrc
Zero
AddAdd
result
Shiftleft 2
ALUresult
ALU
Zero
Add
0
1
Mux
0
1
Mux
86
When decide to branch, other inst. are in pipeline!
Reg
Reg
CC 1
Time (in clock cycles)
40 beq $1, $3, 7
Programexecutionorder(in instructions)
IM Reg
IM DM
IM DM
IM DM
DM
DM Reg
Reg Reg
Reg
Reg
RegIM
44 and $12, $2, $5
48 or $13, $6, $2
52 add $14, $2, $2
72 lw $4, 50($7)
CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9
Reg
Branch Hazards
87
Handling Branch Hazard
Predict branch always not taken Need to add hardware for flushing inst. if wrong Branch decision made at MEM => need to flush
instruction in IF/ID, ID/EX by changing control values to 0
Reduce delay of taken branch by moving branch execution earlier in the pipeline Move up branch address calculation to ID Check branch equality at ID (using XOR) by
comparing the two registers read during ID Branch decision made at ID => one instruction
to flush Add a control signal, IF.Flush, to zero instruction
field of IF/ID => making the instruction an NOP
88
Pipeline with Flushing
PC Instructionmemory
4
Registers
Mux
Mux
Mux
ALU
EX
M
WB
M
WB
WB
ID/EX
0
EX/MEM
MEM/WB
Datamemory
Mux
Hazarddetection
unit
Forwardingunit
IF.Flush
IF/ID
Signextend
Control
Mux
=
Shiftleft 2
Mux
89
Example 5: Cycle 3
90
PCInstruction
memory
4
Registers
Signextend
Mux
Mux
Control
EX
M
WB
M
WB
WB
Mux
Hazarddetection
unit
Forwardingunit
Mux
IF.Flush
IF/ID
and $12, $2, $5 beq $1, $3, 7 sub $10, $4, $8
MEM/WB
EX/MEM
ID/EX
Clock 3
72 44
48 44
28
7
$1
$3
10
48
72
72
0
Mux
0
$4
$8
ALUData
memory
bubble (nop)lw $4, 50($7)
Clock 4
Mux
Shiftleft 2
before<1>
beq $1, $3, 7 sub $10, . . . before<1>
before<2>
=
PC Instructionmemory
4
Registers
Signextend
Mux
Mux
Control
EX
M
WB
M
WB
WB
Mux
Hazarddetection
unit
Forwardingunit
IF.Flush
IF/ID
MEM/WB
EX/MEM
ID/EX
76 72
76 72
$1
$3
10
76
ALUData
memory
Mux
Shiftleft 2
=
Example 5: Cycle 4
91
PCInstruction
memory
4
Registers
Signextend
Mux
Mux
Control
EX
M
WB
M
WB
WB
Mux
Hazarddetection
unit
Forwardingunit
Mux
IF.Flush
IF/ID
and $12, $2, $5 beq $1, $3, 7 sub $10, $4, $8
MEM/WB
EX/MEM
ID/EX
Clock 3
72 44
48 44
28
7
$1
$3
10
48
72
72
0
Mux
0
$4
$8
ALUData
memory
bubble (nop)lw $4, 50($7)
Clock 4
Mux
Shiftleft 2
before<1>
beq $1, $3, 7 sub $10, . . . before<1>
before<2>
=
PC Instructionmemory
4
Registers
Signextend
Mux
Mux
Control
EX
M
WB
M
WB
WB
Mux
Hazarddetection
unit
Forwardingunit
IF.Flush
IF/ID
MEM/WB
EX/MEM
ID/EX
76 72
76 72
$1
$3
10
76
ALUData
memory
Mux
Shiftleft 2
=
Dynamic Branch Prediction
In deeper and superscalar pipelines, branch penalty is more significant
Use dynamic prediction (e.g. loop) Branch prediction buffer (i.e., branch history
table) Indexed by lower-order bits of branch
instruction addresses Stores outcome (taken/not taken) To execute a branch
Check table, expect the same outcome Start fetching from fall-through or target If wrong, flush pipeline and flip prediction
92
Calculating the Branch Target
Even with predictor, still need to calculate the target address 1-cycle penalty for a taken branch
Another approach: branch target buffer Cache of target addresses (cache tagged by
address of branch instruction) Cache access by PC indexing when instruction
is fetched If hit and instruction is branch predicted taken,
can fetch target immediately
93
Outline
An overview of pipelining A pipelined datapath Pipelined control Hazards: types of hazard Handling data hazards
Inserting NOP ( software) Forwarding, R-Type-use ( hardware) Stalls, load-use (hardware)
Handling branch hazards Exceptions Superscalar and dynamic pipelining
94
Exceptions and Interrupts
“ Unexpected” events requiring changein flow of control (jump to OS) Different ISAs use the terms differently
Exception Arises within the CPU e.g., undefined opcode, overflow, syscall,
… Interrupt
From an external I/O controller In MIPS, exceptions managed by a System
Control Coprocessor (CP0) Dealing with them without sacrificing
performance is hard
§4.9 Exceptions
95
Handling Exceptions
Before jump to OS:
Save PC of offending (or interrupted) instruction In MIPS: Exception Program Counter (EPC)
Save indication of the problem In MIPS: Cause register We’ll assume 1-bit
0 for undefined opcode, 1 for overflow Jump to handler at 8000 00180
96
Handler Actions
In OS:
Read cause, and transfer to relevant handler Determine action required If restartable
Take corrective action use EPC to return to program
Otherwise Terminate program Report error using EPC, cause, …
97
Exceptions in a Pipeline
Another form of control hazard Consider overflow on add in EX stage
add $1, $2, $1 Prevent $1 from being clobbered Complete previous instructions Flush add and subsequent instructions Set Cause and EPC register values Transfer control to handler
Similar to mispredicted branch Use much of the same hardware
98
Pipeline with Exceptions
99
Exception Example
Exception on add in40 sub $11, $2, $444 and $12, $2, $548 or $13, $2, $64C add $1, $2, $150 slt $15, $6, $754 lw $16, 50($7)…
Handler80000180 sw $25, 1000($0)80000184 sw $26, 1004($0)…
100
Exception Example
101
Exception Example
102
Outline
An overview of pipelining A pipelined datapath Pipelined control Hazards: types of hazard Handling data hazards
Inserting NOP ( software) Forwarding, R-Type-use ( hardware) Stalls, load-use (hardware)
Handling branch hazards Exceptions Superscalar and dynamic pipelining
103
Instruction-Level Parallelism (ILP)
Pipelining: executing multiple instructions in parallel
To increase ILP Deeper pipeline
Less work per stage shorter clock cycle Multiple issue
Replicate pipeline stages multiple pipelines Start multiple instructions per clock cycle CPI < 1, so use Instructions Per Cycle (IPC) E.g., 4GHz 4-way multiple-issue (Assume 1
instruction completed in 1 cycle) 16 BIPS, peak CPI = 0.25, peak IPC = 4 But dependencies reduce this in practice
§4.10 Parallelism
and Advanced Instruction Level P
arallelism
104
Multiple Issue
Static multiple issue Compiler groups instructions to be issued
together Packages them into “issue slots” Compiler detects and avoids hazards
Dynamic multiple issue CPU examines instruction stream and chooses
instructions to issue each cycle CPU resolves hazards using advanced
techniques at runtime Compiler can help by reordering instructions
105
Speculation
“ Guess” what to do with an instruction Start operation as soon as possible Check whether guess was right
If so, complete the operation If not, roll-back and do the right thing
Common to static and dynamic multiple issue Examples
Speculate on branch outcome Roll back if path taken is different
Speculate on load A store that proceeds a load does not refer to
the same address (Perform load first) Roll back if location is updated
106
Compiler/Hardware Speculation
Compiler can reorder instructions e.g., move load before store Can include “fix-up” instructions to recover
from incorrect guess Hardware can look ahead for instructions to
execute Buffer results until it determines they are
actually needed Flush buffers on incorrect speculation
107
Multiple Issue
Static multiple issue (compiler approach) Dynamic multiple issue (hardware approach)
108
Static Multiple Issue
Compiler groups instructions into “issue packets” Group of instructions that can be issued on a
single cycle Determined by pipeline resources required
Think of an issue packet as a very long instruction Specifies multiple concurrent operations Very Long Instruction Word (VLIW)
109
Scheduling Static Multiple Issue
Compiler must remove some/all hazards Reorder instructions into issue packets No dependencies with a packet Possibly some dependencies between packets
Varies between ISAs; compiler must know! Pad with nop if necessary
110
MIPS with Static Dual Issue
Two-issue packets One ALU/branch instruction One load/store instruction 64-bit aligned
ALU/branch, then load/store Pad an unused instruction with nop
Address
Instruction type
Pipeline Stages
n ALU/branch IF ID EX MEM WB
n + 4 Load/store IF ID EX MEM WB
n + 8 ALU/branch IF ID EX MEM WB
n + 12 Load/store IF ID EX MEM WB
n + 16 ALU/branch IF ID EX MEM WB
n + 20 Load/store IF ID EX MEM WB
111
MIPS with Static Dual Issue
112
Multiple Issue
Static multiple issue (compiler approach) Dynamic multiple issue (hardware approach)
113
Dynamic Multiple Issue
Simplest version of“ Superscalar” processors CPU decides whether to issue 0, 1, 2, … each
cycle Avoiding structural and data hazards
Avoids the need for compiler scheduling Though it may still help Code semantics ensured by the CPU
114
Dynamic Pipeline Scheduling
Allow the CPU to execute instructions out of order to avoid stalls But commit result to registers in order
Examplelw $t0, 20($s2)addu $t1, $t0, $t2sub $s4, $s4, $t3slti $t5, $s4, 20
Can start sub while addu is waiting for lw
115
Dynamically Scheduled CPU
116
Results also sent to any waiting
reservation stations
Reorders buffer for register writes
Can supply operands for issued
instructions
Preserves dependencies
Hold pending operands
Why Do Dynamic Scheduling?
Why not just let the compiler schedule code?1. Not all stalls are predicable
e.g., cache misses2. Can’t always schedule around branches
Branch outcome is dynamically determined3. Different implementations of an ISA have
different latencies and hazards (re-compile ?)
117
Fallacies
Pipelining is easy (!) The basic idea is easy The devil is in the details
e.g., detecting data hazards Pipelining is independent of technology
So why haven’t we always done pipelining? More transistors make more advanced
techniques feasible Pipeline-related ISA design needs to take
account of technology trends e.g., predicted instructions
§4.13 Fallacies and P
itfalls
118
Pitfalls
Poor ISA design can make pipelining harder e.g., complex instruction sets (VAX, IA-32)
Significant overhead to make pipelining work IA-32 micro-op approach
e.g., complex addressing modes Register update side effects, memory indirection
119
Concluding Remarks
ISA influences design of datapath and control Datapath and control influence design of ISA Pipelining improves instruction throughput
using parallelism More instructions completed per second Latency for each instruction not reduced
Hazards: structural, data, control Multiple issue and dynamic scheduling (ILP)
Dependencies limit achievable parallelism Complexity leads to the power wall
§4.14 Concluding R
emarks
120