coe 308
DESCRIPTION
COE 308. Enhancing Performance with Pipelining. Laundry Example. Student doing laundry (processing one load). Washing a single load of laundry. Drying a single load of laundry. Folding a single load. Putting the load in the closet. Sequential Laundry. 1. 2 AM. 7. 8. 9. 10. 11. 12. - PowerPoint PPT PresentationTRANSCRIPT
COE 308COE 308
King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals
Computer Engineering Department
Computer Engineering Department
College of ComputerScience And Engineering
College of ComputerScience And Engineering
Pipeline 1
COE 308
Enhancing Performance with Pipelining
COE 308COE 308
King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals
Computer Engineering Department
Computer Engineering Department
College of ComputerScience And Engineering
College of ComputerScience And Engineering
Pipeline 2
Laundry ExampleStudent doing laundry (processing one load)
Washing a single load of laundry
Drying a single load of laundry
Folding a single load
Putting the load in the closet
COE 308COE 308
King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals
Computer Engineering Department
Computer Engineering Department
College of ComputerScience And Engineering
College of ComputerScience And Engineering
Pipeline 3
Sequential Laundry6 PM 7 8 9 10 11 12 1 2 AM
Task order
A
B
C
D
Sequential Laundry takes 8 hours for four loads of wash …
COE 308COE 308
King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals
Computer Engineering Department
Computer Engineering Department
College of ComputerScience And Engineering
College of ComputerScience And Engineering
Pipeline 4
Pipelined Laundry
… while pipelined laundry takes just 3.5 hours
6 PM 7 8 9 10 11 12 1 2 AM
Task order
A
B
C
D
COE 308COE 308
King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals
Computer Engineering Department
Computer Engineering Department
College of ComputerScience And Engineering
College of ComputerScience And Engineering
Pipeline 5
Pipelining AnalysisPipelining possible because:• All four laundry steps use independent stations
– Washing uses the washer which is independent from the dryer used in the drying step and from the table used in the folding step.
– This means that once the washing step is done, it is possible to use the washer (for another load) while the current load is drying in the dryer
• All steps are always used in the same order– Washing always occurs before drying as it is not correct to dry
clothes that haven’t been washed yet– Drying always occur before folding– …
COE 308COE 308
King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals
Computer Engineering Department
Computer Engineering Department
College of ComputerScience And Engineering
College of ComputerScience And Engineering
Pipeline 6
Pipelining Processor Execution
• Processor executes instructions• Instruction execution process can be pipelined ?
– Yes because it can be divided into steps– And because the order of the execution steps is the
same (most of the time)• Instruction execution steps
– Fetch instruction from memory– Read registers while decoding the instruction– Execute the operation– Access an operand in data memory– Write the result into a register
COE 308COE 308
King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals
Computer Engineering Department
Computer Engineering Department
College of ComputerScience And Engineering
College of ComputerScience And Engineering
Pipeline 7
Pipeline StagesInstruction execution steps are called: pipeline stages:• Instruction Fetch (IF stage)• Instruction Decode (ID)• EXecute operation (EX)• MEMory access (MEM)• Write Back the result (WB)
IF
ID
EX
MEM
WB
COE 308COE 308
King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals
Computer Engineering Department
Computer Engineering Department
College of ComputerScience And Engineering
College of ComputerScience And Engineering
Pipeline 8
Processor PipelinePipeline is well represented as a timing diagram (laundry example)
The following sequence is represented:
IF ID EX MEM WB
add $1, $3, $5
sub $3, $1, $4
and $2, $5, $1
or $7, $1, $9
addi $10, $6, $3
IF ID EX MEM WB
IF ID EX MEM WB
IF ID EX MEM WB
IF ID EX MEM WB
add
sub
and
or
addi
Five Instructions are Executed in 9 cycles
Clock Cycle
COE 308COE 308
King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals
Computer Engineering Department
Computer Engineering Department
College of ComputerScience And Engineering
College of ComputerScience And Engineering
Pipeline 9
Data Dependency HazardExamine the following instructions:add $1, $3, $5
sub $3, $1, $4
and $2, $5, $1
or $7, $1, $9
There is a dependency between add and sub on register $1 as it is used by sub after it is modified by add
IF ID EX MEM WB
IF ID EX MEM WB
add
sub
The result of the add instruction is written in the $1 register NOT BEFORE the WB stage
However, the sub instruction fetches the value of register $1 during the ID stage
Problem: The sub instruction will fetch the wrong value of register $1 because the correct value has not been written in there yet.
COE 308COE 308
King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals
Computer Engineering Department
Computer Engineering Department
College of ComputerScience And Engineering
College of ComputerScience And Engineering
Pipeline 10
Types of DependenciesAll cases of data dependencies should be analyzed to see whether they cause any malfunction in the pipeline context:
Data Dependency cases:• Read After Write (RAW)• Read After Read (RAR)• Write After Write (WAW)• Write After Read (WAR)
COE 308COE 308
King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals
Computer Engineering Department
Computer Engineering Department
College of ComputerScience And Engineering
College of ComputerScience And Engineering
Pipeline 11
RAW Dependencyadd $1, $3, $5
sub $3, $1, $4
and $2, $5, $1
or $7, $1, $9
Read After Write (RAW) dependencies
It is the fact that some instructions have the same source register that is a destination in a previous
instruction which means that the next instructions will need to read the value of this register while it is going
to be written by the previous instruction
Problem: The next instruction(s) will fetch the wrong values of the dependent registers because the correct values have not been written back yet.
COE 308COE 308
King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals
Computer Engineering Department
Computer Engineering Department
College of ComputerScience And Engineering
College of ComputerScience And Engineering
Pipeline 12
RAR Dependency add $1, $3, $5
sub $3, $5, $4
and $2, $4, $1
or $7, $1, $9
Read After Read (RAR) dependencies
Two consecutive instructions use the same register as a source operand
No Problem: As long as the registers are not modified, pipelining does not affect the normal execution process in this case
COE 308COE 308
King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals
Computer Engineering Department
Computer Engineering Department
College of ComputerScience And Engineering
College of ComputerScience And Engineering
Pipeline 13
WAW Dependency add $1, $3, $5
sub $1, $5, $4
and $4, $4, $1
or $4, $1, $9
Write After Write (WAW) dependencies
Two consecutive instructions use the same register as a destination operand
No Problem: Writes occur during the last pipeline stage and no inconsistency results from this situation because the instructions execution order is maintained
COE 308COE 308
King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals
Computer Engineering Department
Computer Engineering Department
College of ComputerScience And Engineering
College of ComputerScience And Engineering
Pipeline 14
WAR Dependency add $1, $3, $5
sub $3, $5, $2
and $2, $4, $1
or $7, $1, $9
Write After Read (RAR) dependencies
The next instruction uses the same register, used as a source operand by a previous instruction, as destination register
No Problem: Read occurs in ID stage and Write occurs in WB stage which means that the order of operations is not altered by the pipeline structure
COE 308COE 308
King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals
Computer Engineering Department
Computer Engineering Department
College of ComputerScience And Engineering
College of ComputerScience And Engineering
Pipeline 15
RAW Dependency Cases
i: add $1, $3, $5
i+1: sub $3, $1, $4
i+2: and $2, $5, $1
i+3: or $7, $1, $9
Case 1 dependency between instruction i and instruction i+1
Case 2 dependency between instruction i and instruction i+2
Case 3 dependency between instruction i and instruction i+3
Every case needs to be checked in order to determined whether it poses a real problem or not
COE 308COE 308
King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals
Computer Engineering Department
Computer Engineering Department
College of ComputerScience And Engineering
College of ComputerScience And Engineering
Pipeline 16
RAW Dependency Case 1i: add $1, $3, $5
i+1: sub $3, $1, $4
i+2: and $2, $5, $1
i+3: or $7, $1, $9
IF ID EX MEM WB
IF ID EX MEM WB
add
sub
Operand is fetched BEFORE it is written back
Case 1 dependency between instruction i and instruction i+1
COE 308COE 308
King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals
Computer Engineering Department
Computer Engineering Department
College of ComputerScience And Engineering
College of ComputerScience And Engineering
Pipeline 17
WB
RAW Dependency Case 2i: add $1, $3, $5
i+1: sub $3, $1, $4
i+2: and $2, $5, $1
i+3: or $7, $1, $9
Operand is fetched BEFORE it is written back
IF ID EX MEM
IF ID EX MEM WB
IF ID EX MEM WB
add
sub
and
Case 2 dependency between instruction i and instruction i+1
COE 308COE 308
King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals
Computer Engineering Department
Computer Engineering Department
College of ComputerScience And Engineering
College of ComputerScience And Engineering
Pipeline 18
RAW Dependency Case 3i: add $1, $3, $5
i+1: sub $3, $1, $4
i+2: and $2, $5, $1
i+3: or $7, $1, $9
Operand is fetched AT THE SAME TIME it is written back
IF ID EX MEM WB
IF ID EX MEM WB
IF ID EX MEM WB
IF ID EX MEM WB
add
sub
and
or
Case 3 dependency between instruction i and instruction i+1
COE 308COE 308
King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals
Computer Engineering Department
Computer Engineering Department
College of ComputerScience And Engineering
College of ComputerScience And Engineering
Pipeline 19
Register File ModelCase 3 does not pose a problem because we assume that:
In the Register File Writes occur BEFORE Reads
This is only true if we use the falling edge of the clock to write
Clock
ID Stage
Write is prepared hereWrite occurs here
Read occurs here
COE 308COE 308
King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals
Computer Engineering Department
Computer Engineering Department
College of ComputerScience And Engineering
College of ComputerScience And Engineering
Pipeline 20
Data Dependency Solutions• Data dependency between instructions
causes fetch of operands at the wrong time.
• Obvious remedy is to DELAY the fetch of operands to after the correct value is written in the register file– In software, by inserting NOP instructions– In hardware, by stalling the pipeline
COE 308COE 308
King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals
Computer Engineering Department
Computer Engineering Department
College of ComputerScience And Engineering
College of ComputerScience And Engineering
Pipeline 21
NOP InsertionInsertion of two NOP instructions will solve the data dependency problem
IF ID EX MEM WB
add $1, $3, $5
nop
nop
sub $3, $1, $4
and $2, $5, $1IF ID EX MEM WB
IF ID EX MEM WB
IF ID EX MEM WB
IF ID EX MEM WB
add
nop
nop
sub
and
COE 308COE 308
King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals
Computer Engineering Department
Computer Engineering Department
College of ComputerScience And Engineering
College of ComputerScience And Engineering
Pipeline 22
Pipeline StallDelaying the fetch of the operands can be implemented in software
IF ID EX MEM WB
add $1, $3, $5
sub $3, $1, $4
and $2, $5, $1
or $7, $1, $9
addi $10, $6, $3
IF ID EX MEM WB
IF ID EX MEM
IF ID EX
IF ID
add
sub
and
or
addi
It is equivalent to …
The instruction sub is maintained in the IF stage for
two extra clock cycles
IF IF
COE 308COE 308
King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals
Computer Engineering Department
Computer Engineering Department
College of ComputerScience And Engineering
College of ComputerScience And Engineering
Pipeline 23
Pipeline Stall… inserting bubbles in the pipeline
IF ID EX MEM WB
IF
ID EX MEM WB
IF ID EX MEM
add
sub
or The instruction sub is maintained in the IF stage for
two extra clock cycles
IF
IF
ID EX MEM WB
ID EX MEM WB
While virtual nop instructions are inserted in the pipeline (as bubbles)
COE 308COE 308
King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals
Computer Engineering Department
Computer Engineering Department
College of ComputerScience And Engineering
College of ComputerScience And Engineering
Pipeline 24
Branch HazardExamine the following instructions:
beq $1, $3, Target
sub $3, $1, $4
and $2, $5, $1
...
Target: or $3, $5, $9
In the case the branch is taken, the instructions sub and add are wrongfully executed because they are fetched BEFORE the branch decision is made
Problem: Modification of the Program Logic: Unacceptable Behavior
IF ID EX MEM WB
IF ID EX MEM WB
IF ID EX MEM WB
IF ID EX MEM WB
beq
sub
and
or
Branch decision is taken and Target is fetched
COE 308COE 308
King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals
Computer Engineering Department
Computer Engineering Department
College of ComputerScience And Engineering
College of ComputerScience And Engineering
Pipeline 25
Branch Hazard SolutionThe solution is to:• Not to let the instructions after the branch finish
execution in the case the branch is taken– Instruction transformation into nops (in hardware)
• Put instructions which do not disturb the logic of the program after the branch instruction so that their execution will not modify the logic of the program.– Insertion of nop instructions after each branch instruction (by the
compiler)
COE 308COE 308
King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals
Computer Engineering Department
Computer Engineering Department
College of ComputerScience And Engineering
College of ComputerScience And Engineering
Pipeline 26
NOP forcing
IF ID EX MEM WB
IF ID EX MEM WB
IF ID EX MEM WB
IF ID EX MEM WB
beq
sub
and
orBranch decision is taken and Target is fetched
Transformed into NOPs after branch taken
After branch is taken, following instruction are forced as NOP instructions for the subsequent pipeline stages until the branch target instruction is fetched. NOP will have no effect.It is also said that instruction execution is killed
COE 308COE 308
King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals
Computer Engineering Department
Computer Engineering Department
College of ComputerScience And Engineering
College of ComputerScience And Engineering
Pipeline 27
NOP Insertion
IF ID EX MEM WB
IF ID EX MEM WB
IF ID EX MEM WB
IF ID EX MEM WB
add
nop
nop
or
beq $1, $3, Target
sub $3, $1, $4
and $2, $5, $1
...
Target: or $3, $5, $9
Insertion of NOP instructions, by the compiler, after each branch instruction, does not disturb the logic of the program.
COE 308COE 308
King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals
Computer Engineering Department
Computer Engineering Department
College of ComputerScience And Engineering
College of ComputerScience And Engineering
Pipeline 28
Delayed Branch• Insertion of NOP instructions introduces a substantial overhead that
increases the instruction count significantly. • Idea is to move actual instructions from the area before the branch to the
slots after the branch to fill in the nop slots without modifying the logic of the program
xor$2, $2, $5
and$1, $7, $8
sub$10, $6, $4
add$3, $6, $7
beq$1, $3, Target
sub$3, $1, $4
and$2, $5, $1
Original code
and $1, $7, $8
add $3, $6, $7
beq $1, $3, Target
xor $2, $2, $5
sub $10, $6, $4
sub $3, $1, $4
and $2, $5, $1
Transformed code
No dependency
Register $1 used by beq
No dependency
Register $3 used by beq
COE 308COE 308
King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals
Computer Engineering Department
Computer Engineering Department
College of ComputerScience And Engineering
College of ComputerScience And Engineering
Pipeline 29
Delayed BranchConsider the transformed code obtained after moving the xor and sub instructions after the beq instruction:
and $1, $7, $8
add $3, $6, $7
beq $1, $3, Target
xor $2, $2, $5
sub $10, $6, $4
sub $3, $1, $4
and $2, $5, $1
A programmer who reads the code without any idea about the execution will think that the branch occurs here
The execution will actually make the branch take effect here; so while the instructions xor and sub are executed, the second sub and the and instructions are not
Branch instruction and branch execution are sparated by a two instruction delay that’s why it is called: Delayed Branch
COE 308COE 308
King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals
Computer Engineering Department
Computer Engineering Department
College of ComputerScience And Engineering
College of ComputerScience And Engineering
Pipeline 30
Pipelined Datapath
COE 308COE 308
King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals
Computer Engineering Department
Computer Engineering Department
College of ComputerScience And Engineering
College of ComputerScience And Engineering
Pipeline 31
Inserting Pipeline Registers
COE 308COE 308
King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals
Computer Engineering Department
Computer Engineering Department
College of ComputerScience And Engineering
College of ComputerScience And Engineering
Pipeline 32
Writing Back the Result
COE 308COE 308
King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals
Computer Engineering Department
Computer Engineering Department
College of ComputerScience And Engineering
College of ComputerScience And Engineering
Pipeline 33
Destination Register Specifier ?
COE 308COE 308
King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals
Computer Engineering Department
Computer Engineering Department
College of ComputerScience And Engineering
College of ComputerScience And Engineering
Pipeline 34
Branch Logic
COE 308COE 308
King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals
Computer Engineering Department
Computer Engineering Department
College of ComputerScience And Engineering
College of ComputerScience And Engineering
Pipeline 35
Pipelined Control
COE 308COE 308
King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals
Computer Engineering Department
Computer Engineering Department
College of ComputerScience And Engineering
College of ComputerScience And Engineering
Pipeline 36
Data Hazards and Forwarding
COE 308COE 308
King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals
Computer Engineering Department
Computer Engineering Department
College of ComputerScience And Engineering
College of ComputerScience And Engineering
Pipeline 37
Forwarding Unit