coe 308

37
COE 308 King Fahd University of Petroleum and Minerals Computer Engineering Department College of Computer Science And Engineering Pipeline 1 COE 308 Enhancing Performance with Pipelining

Upload: cedric-craig

Post on 03-Jan-2016

54 views

Category:

Documents


2 download

DESCRIPTION

COE 308. Enhancing Performance with Pipelining. Laundry Example. Student doing laundry (processing one load). Washing a single load of laundry. Drying a single load of laundry. Folding a single load. Putting the load in the closet. Sequential Laundry. 1. 2 AM. 7. 8. 9. 10. 11. 12. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: COE 308

COE 308COE 308

King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals

Computer Engineering Department

Computer Engineering Department

College of ComputerScience And Engineering

College of ComputerScience And Engineering

Pipeline 1

COE 308

Enhancing Performance with Pipelining

Page 2: COE 308

COE 308COE 308

King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals

Computer Engineering Department

Computer Engineering Department

College of ComputerScience And Engineering

College of ComputerScience And Engineering

Pipeline 2

Laundry ExampleStudent doing laundry (processing one load)

Washing a single load of laundry

Drying a single load of laundry

Folding a single load

Putting the load in the closet

Page 3: COE 308

COE 308COE 308

King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals

Computer Engineering Department

Computer Engineering Department

College of ComputerScience And Engineering

College of ComputerScience And Engineering

Pipeline 3

Sequential Laundry6 PM 7 8 9 10 11 12 1 2 AM

Task order

A

B

C

D

Sequential Laundry takes 8 hours for four loads of wash …

Page 4: COE 308

COE 308COE 308

King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals

Computer Engineering Department

Computer Engineering Department

College of ComputerScience And Engineering

College of ComputerScience And Engineering

Pipeline 4

Pipelined Laundry

… while pipelined laundry takes just 3.5 hours

6 PM 7 8 9 10 11 12 1 2 AM

Task order

A

B

C

D

Page 5: COE 308

COE 308COE 308

King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals

Computer Engineering Department

Computer Engineering Department

College of ComputerScience And Engineering

College of ComputerScience And Engineering

Pipeline 5

Pipelining AnalysisPipelining possible because:• All four laundry steps use independent stations

– Washing uses the washer which is independent from the dryer used in the drying step and from the table used in the folding step.

– This means that once the washing step is done, it is possible to use the washer (for another load) while the current load is drying in the dryer

• All steps are always used in the same order– Washing always occurs before drying as it is not correct to dry

clothes that haven’t been washed yet– Drying always occur before folding– …

Page 6: COE 308

COE 308COE 308

King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals

Computer Engineering Department

Computer Engineering Department

College of ComputerScience And Engineering

College of ComputerScience And Engineering

Pipeline 6

Pipelining Processor Execution

• Processor executes instructions• Instruction execution process can be pipelined ?

– Yes because it can be divided into steps– And because the order of the execution steps is the

same (most of the time)• Instruction execution steps

– Fetch instruction from memory– Read registers while decoding the instruction– Execute the operation– Access an operand in data memory– Write the result into a register

Page 7: COE 308

COE 308COE 308

King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals

Computer Engineering Department

Computer Engineering Department

College of ComputerScience And Engineering

College of ComputerScience And Engineering

Pipeline 7

Pipeline StagesInstruction execution steps are called: pipeline stages:• Instruction Fetch (IF stage)• Instruction Decode (ID)• EXecute operation (EX)• MEMory access (MEM)• Write Back the result (WB)

IF

ID

EX

MEM

WB

Page 8: COE 308

COE 308COE 308

King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals

Computer Engineering Department

Computer Engineering Department

College of ComputerScience And Engineering

College of ComputerScience And Engineering

Pipeline 8

Processor PipelinePipeline is well represented as a timing diagram (laundry example)

The following sequence is represented:

IF ID EX MEM WB

add $1, $3, $5

sub $3, $1, $4

and $2, $5, $1

or $7, $1, $9

addi $10, $6, $3

IF ID EX MEM WB

IF ID EX MEM WB

IF ID EX MEM WB

IF ID EX MEM WB

add

sub

and

or

addi

Five Instructions are Executed in 9 cycles

Clock Cycle

Page 9: COE 308

COE 308COE 308

King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals

Computer Engineering Department

Computer Engineering Department

College of ComputerScience And Engineering

College of ComputerScience And Engineering

Pipeline 9

Data Dependency HazardExamine the following instructions:add $1, $3, $5

sub $3, $1, $4

and $2, $5, $1

or $7, $1, $9

There is a dependency between add and sub on register $1 as it is used by sub after it is modified by add

IF ID EX MEM WB

IF ID EX MEM WB

add

sub

The result of the add instruction is written in the $1 register NOT BEFORE the WB stage

However, the sub instruction fetches the value of register $1 during the ID stage

Problem: The sub instruction will fetch the wrong value of register $1 because the correct value has not been written in there yet.

Page 10: COE 308

COE 308COE 308

King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals

Computer Engineering Department

Computer Engineering Department

College of ComputerScience And Engineering

College of ComputerScience And Engineering

Pipeline 10

Types of DependenciesAll cases of data dependencies should be analyzed to see whether they cause any malfunction in the pipeline context:

Data Dependency cases:• Read After Write (RAW)• Read After Read (RAR)• Write After Write (WAW)• Write After Read (WAR)

Page 11: COE 308

COE 308COE 308

King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals

Computer Engineering Department

Computer Engineering Department

College of ComputerScience And Engineering

College of ComputerScience And Engineering

Pipeline 11

RAW Dependencyadd $1, $3, $5

sub $3, $1, $4

and $2, $5, $1

or $7, $1, $9

Read After Write (RAW) dependencies

It is the fact that some instructions have the same source register that is a destination in a previous

instruction which means that the next instructions will need to read the value of this register while it is going

to be written by the previous instruction

Problem: The next instruction(s) will fetch the wrong values of the dependent registers because the correct values have not been written back yet.

Page 12: COE 308

COE 308COE 308

King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals

Computer Engineering Department

Computer Engineering Department

College of ComputerScience And Engineering

College of ComputerScience And Engineering

Pipeline 12

RAR Dependency add $1, $3, $5

sub $3, $5, $4

and $2, $4, $1

or $7, $1, $9

Read After Read (RAR) dependencies

Two consecutive instructions use the same register as a source operand

No Problem: As long as the registers are not modified, pipelining does not affect the normal execution process in this case

Page 13: COE 308

COE 308COE 308

King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals

Computer Engineering Department

Computer Engineering Department

College of ComputerScience And Engineering

College of ComputerScience And Engineering

Pipeline 13

WAW Dependency add $1, $3, $5

sub $1, $5, $4

and $4, $4, $1

or $4, $1, $9

Write After Write (WAW) dependencies

Two consecutive instructions use the same register as a destination operand

No Problem: Writes occur during the last pipeline stage and no inconsistency results from this situation because the instructions execution order is maintained

Page 14: COE 308

COE 308COE 308

King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals

Computer Engineering Department

Computer Engineering Department

College of ComputerScience And Engineering

College of ComputerScience And Engineering

Pipeline 14

WAR Dependency add $1, $3, $5

sub $3, $5, $2

and $2, $4, $1

or $7, $1, $9

Write After Read (RAR) dependencies

The next instruction uses the same register, used as a source operand by a previous instruction, as destination register

No Problem: Read occurs in ID stage and Write occurs in WB stage which means that the order of operations is not altered by the pipeline structure

Page 15: COE 308

COE 308COE 308

King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals

Computer Engineering Department

Computer Engineering Department

College of ComputerScience And Engineering

College of ComputerScience And Engineering

Pipeline 15

RAW Dependency Cases

i: add $1, $3, $5

i+1: sub $3, $1, $4

i+2: and $2, $5, $1

i+3: or $7, $1, $9

Case 1 dependency between instruction i and instruction i+1

Case 2 dependency between instruction i and instruction i+2

Case 3 dependency between instruction i and instruction i+3

Every case needs to be checked in order to determined whether it poses a real problem or not

Page 16: COE 308

COE 308COE 308

King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals

Computer Engineering Department

Computer Engineering Department

College of ComputerScience And Engineering

College of ComputerScience And Engineering

Pipeline 16

RAW Dependency Case 1i: add $1, $3, $5

i+1: sub $3, $1, $4

i+2: and $2, $5, $1

i+3: or $7, $1, $9

IF ID EX MEM WB

IF ID EX MEM WB

add

sub

Operand is fetched BEFORE it is written back

Case 1 dependency between instruction i and instruction i+1

Page 17: COE 308

COE 308COE 308

King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals

Computer Engineering Department

Computer Engineering Department

College of ComputerScience And Engineering

College of ComputerScience And Engineering

Pipeline 17

WB

RAW Dependency Case 2i: add $1, $3, $5

i+1: sub $3, $1, $4

i+2: and $2, $5, $1

i+3: or $7, $1, $9

Operand is fetched BEFORE it is written back

IF ID EX MEM

IF ID EX MEM WB

IF ID EX MEM WB

add

sub

and

Case 2 dependency between instruction i and instruction i+1

Page 18: COE 308

COE 308COE 308

King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals

Computer Engineering Department

Computer Engineering Department

College of ComputerScience And Engineering

College of ComputerScience And Engineering

Pipeline 18

RAW Dependency Case 3i: add $1, $3, $5

i+1: sub $3, $1, $4

i+2: and $2, $5, $1

i+3: or $7, $1, $9

Operand is fetched AT THE SAME TIME it is written back

IF ID EX MEM WB

IF ID EX MEM WB

IF ID EX MEM WB

IF ID EX MEM WB

add

sub

and

or

Case 3 dependency between instruction i and instruction i+1

Page 19: COE 308

COE 308COE 308

King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals

Computer Engineering Department

Computer Engineering Department

College of ComputerScience And Engineering

College of ComputerScience And Engineering

Pipeline 19

Register File ModelCase 3 does not pose a problem because we assume that:

In the Register File Writes occur BEFORE Reads

This is only true if we use the falling edge of the clock to write

Clock

ID Stage

Write is prepared hereWrite occurs here

Read occurs here

Page 20: COE 308

COE 308COE 308

King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals

Computer Engineering Department

Computer Engineering Department

College of ComputerScience And Engineering

College of ComputerScience And Engineering

Pipeline 20

Data Dependency Solutions• Data dependency between instructions

causes fetch of operands at the wrong time.

• Obvious remedy is to DELAY the fetch of operands to after the correct value is written in the register file– In software, by inserting NOP instructions– In hardware, by stalling the pipeline

Page 21: COE 308

COE 308COE 308

King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals

Computer Engineering Department

Computer Engineering Department

College of ComputerScience And Engineering

College of ComputerScience And Engineering

Pipeline 21

NOP InsertionInsertion of two NOP instructions will solve the data dependency problem

IF ID EX MEM WB

add $1, $3, $5

nop

nop

sub $3, $1, $4

and $2, $5, $1IF ID EX MEM WB

IF ID EX MEM WB

IF ID EX MEM WB

IF ID EX MEM WB

add

nop

nop

sub

and

Page 22: COE 308

COE 308COE 308

King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals

Computer Engineering Department

Computer Engineering Department

College of ComputerScience And Engineering

College of ComputerScience And Engineering

Pipeline 22

Pipeline StallDelaying the fetch of the operands can be implemented in software

IF ID EX MEM WB

add $1, $3, $5

sub $3, $1, $4

and $2, $5, $1

or $7, $1, $9

addi $10, $6, $3

IF ID EX MEM WB

IF ID EX MEM

IF ID EX

IF ID

add

sub

and

or

addi

It is equivalent to …

The instruction sub is maintained in the IF stage for

two extra clock cycles

IF IF

Page 23: COE 308

COE 308COE 308

King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals

Computer Engineering Department

Computer Engineering Department

College of ComputerScience And Engineering

College of ComputerScience And Engineering

Pipeline 23

Pipeline Stall… inserting bubbles in the pipeline

IF ID EX MEM WB

IF

ID EX MEM WB

IF ID EX MEM

add

sub

or The instruction sub is maintained in the IF stage for

two extra clock cycles

IF

IF

ID EX MEM WB

ID EX MEM WB

While virtual nop instructions are inserted in the pipeline (as bubbles)

Page 24: COE 308

COE 308COE 308

King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals

Computer Engineering Department

Computer Engineering Department

College of ComputerScience And Engineering

College of ComputerScience And Engineering

Pipeline 24

Branch HazardExamine the following instructions:

beq $1, $3, Target

sub $3, $1, $4

and $2, $5, $1

...

Target: or $3, $5, $9

In the case the branch is taken, the instructions sub and add are wrongfully executed because they are fetched BEFORE the branch decision is made

Problem: Modification of the Program Logic: Unacceptable Behavior

IF ID EX MEM WB

IF ID EX MEM WB

IF ID EX MEM WB

IF ID EX MEM WB

beq

sub

and

or

Branch decision is taken and Target is fetched

Page 25: COE 308

COE 308COE 308

King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals

Computer Engineering Department

Computer Engineering Department

College of ComputerScience And Engineering

College of ComputerScience And Engineering

Pipeline 25

Branch Hazard SolutionThe solution is to:• Not to let the instructions after the branch finish

execution in the case the branch is taken– Instruction transformation into nops (in hardware)

• Put instructions which do not disturb the logic of the program after the branch instruction so that their execution will not modify the logic of the program.– Insertion of nop instructions after each branch instruction (by the

compiler)

Page 26: COE 308

COE 308COE 308

King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals

Computer Engineering Department

Computer Engineering Department

College of ComputerScience And Engineering

College of ComputerScience And Engineering

Pipeline 26

NOP forcing

IF ID EX MEM WB

IF ID EX MEM WB

IF ID EX MEM WB

IF ID EX MEM WB

beq

sub

and

orBranch decision is taken and Target is fetched

Transformed into NOPs after branch taken

After branch is taken, following instruction are forced as NOP instructions for the subsequent pipeline stages until the branch target instruction is fetched. NOP will have no effect.It is also said that instruction execution is killed

Page 27: COE 308

COE 308COE 308

King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals

Computer Engineering Department

Computer Engineering Department

College of ComputerScience And Engineering

College of ComputerScience And Engineering

Pipeline 27

NOP Insertion

IF ID EX MEM WB

IF ID EX MEM WB

IF ID EX MEM WB

IF ID EX MEM WB

add

nop

nop

or

beq $1, $3, Target

sub $3, $1, $4

and $2, $5, $1

...

Target: or $3, $5, $9

Insertion of NOP instructions, by the compiler, after each branch instruction, does not disturb the logic of the program.

Page 28: COE 308

COE 308COE 308

King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals

Computer Engineering Department

Computer Engineering Department

College of ComputerScience And Engineering

College of ComputerScience And Engineering

Pipeline 28

Delayed Branch• Insertion of NOP instructions introduces a substantial overhead that

increases the instruction count significantly. • Idea is to move actual instructions from the area before the branch to the

slots after the branch to fill in the nop slots without modifying the logic of the program

xor$2, $2, $5

and$1, $7, $8

sub$10, $6, $4

add$3, $6, $7

beq$1, $3, Target

sub$3, $1, $4

and$2, $5, $1

Original code

and $1, $7, $8

add $3, $6, $7

beq $1, $3, Target

xor $2, $2, $5

sub $10, $6, $4

sub $3, $1, $4

and $2, $5, $1

Transformed code

No dependency

Register $1 used by beq

No dependency

Register $3 used by beq

Page 29: COE 308

COE 308COE 308

King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals

Computer Engineering Department

Computer Engineering Department

College of ComputerScience And Engineering

College of ComputerScience And Engineering

Pipeline 29

Delayed BranchConsider the transformed code obtained after moving the xor and sub instructions after the beq instruction:

and $1, $7, $8

add $3, $6, $7

beq $1, $3, Target

xor $2, $2, $5

sub $10, $6, $4

sub $3, $1, $4

and $2, $5, $1

A programmer who reads the code without any idea about the execution will think that the branch occurs here

The execution will actually make the branch take effect here; so while the instructions xor and sub are executed, the second sub and the and instructions are not

Branch instruction and branch execution are sparated by a two instruction delay that’s why it is called: Delayed Branch

Page 30: COE 308

COE 308COE 308

King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals

Computer Engineering Department

Computer Engineering Department

College of ComputerScience And Engineering

College of ComputerScience And Engineering

Pipeline 30

Pipelined Datapath

Page 31: COE 308

COE 308COE 308

King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals

Computer Engineering Department

Computer Engineering Department

College of ComputerScience And Engineering

College of ComputerScience And Engineering

Pipeline 31

Inserting Pipeline Registers

Page 32: COE 308

COE 308COE 308

King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals

Computer Engineering Department

Computer Engineering Department

College of ComputerScience And Engineering

College of ComputerScience And Engineering

Pipeline 32

Writing Back the Result

Page 33: COE 308

COE 308COE 308

King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals

Computer Engineering Department

Computer Engineering Department

College of ComputerScience And Engineering

College of ComputerScience And Engineering

Pipeline 33

Destination Register Specifier ?

Page 34: COE 308

COE 308COE 308

King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals

Computer Engineering Department

Computer Engineering Department

College of ComputerScience And Engineering

College of ComputerScience And Engineering

Pipeline 34

Branch Logic

Page 35: COE 308

COE 308COE 308

King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals

Computer Engineering Department

Computer Engineering Department

College of ComputerScience And Engineering

College of ComputerScience And Engineering

Pipeline 35

Pipelined Control

Page 36: COE 308

COE 308COE 308

King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals

Computer Engineering Department

Computer Engineering Department

College of ComputerScience And Engineering

College of ComputerScience And Engineering

Pipeline 36

Data Hazards and Forwarding

Page 37: COE 308

COE 308COE 308

King Fahd University of Petroleum and MineralsKing Fahd University of Petroleum and Minerals

Computer Engineering Department

Computer Engineering Department

College of ComputerScience And Engineering

College of ComputerScience And Engineering

Pipeline 37

Forwarding Unit