power and frequency analysis for data and control independence in embedded processors

Post on 31-Jan-2016

27 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Power and Frequency Analysis for Data and Control Independence in Embedded Processors. Farzad Samie Amirali Baniasadi Sharif University of Technology University of Victoria. This Work. Goal - PowerPoint PPT Presentation

TRANSCRIPT

Power and Frequency Analysis for Data and Control Independence in

Embedded Processors

Farzad Samie Amirali Baniasadi

Sharif University of Technology University of Victoria

This Work

Goal• Power and frequency analysis for control independent and data

independent instructions in embedded processors

Motivation• Embedded processors are becoming complex

• Modern embedded processors use speculation

• Mis-speculation causes performance and power penalty

• Power is a major concern in embedded processors

• Save power and gain performance

2

This Work (cont.)

Our Approach• Reducing wasted energy and time in mispredictions.

How?• Identify and bypass Control Independent (CI) and Data Independent

(DI) instructions.

• CIs: Instruction executing independent of branch outcome.

• CI-DI: CI Instructions executing with the same operands.

Key Result:• 12% processor energy reduction.

3

Background

Branch Prediction

4

Branch Predictor

Branch History

Program Counter

Predicted direction

Predicted target address

Wrong Path (squashed) ??

Background (cont.)

5

I1

I2

I3

I4

I7

I8I9

I5I6

Branch Inst.Not taken

Misprediction Detection

Taken

Right Path

I9

I8

I7

I12

I11

I10

Control Independent Instructions (CIs)

Background (cont.)

6

R1←R1+R2

Not taken Taken

R4←R1

If (R4=0)

R2←R4-R1

R5←R2-R3

R3←0

R5←R4+1

R1←R1-1

R3←0

R4←R6+R4

R1←R4+R1

R5←R5-2R3←R3-R4

Data Independent (CI-DI)Data Dependent (CI-DD)Data Dependent (CI-DD)Data Independent (CI-DI)

R1←R1-1R5←R2-R3

R5←R4+1

CI-DI vs. CI-DD

• Bypassing CI-DIs saves more energy• No need to read operands/execute again

• Bypassing CI-DIs provides higher performance• Not need to waste time for reading operand/executing

7

Fetch Issue Dispatch ExecuteWriteBack

CI-DD

CI-DI

Methodology

• Modified SimpleScalar

• Wattch for power measurement

• MiBench: Embedded Benchmark Suite

8

Distribution

Wrong Path: 12%, CI: 5%, CI-DI: 2%9

CI Power Reduction in Different Units

Max: branch predictor unit, Min: instruction cache

10

CI Power Reduction in Stages

11

Rijndael: low misprediction low wrong path low CIs

Power Sensitivity to RUU size

12

CI CI-DI

Higher power dissipation for bigger RUU sizes

Power Sensitivity to Execution Bandwidth

13

CI CI-DI

Higher power dissipation for wider execution bandwidth

Power Sensitivity to Branch Predictor Size

14Little sensitivity to branch predictor size

Related Work

• Rotenberg et. al: studied control independence in superscalar processors, HPCA99.

• Collins et. al: suggested mechanism to predict re-convergent point, Micro04.

• Lam and Wilson: studied impact of CIs on instruction level parallelism, ISCA92.

• Gandhi et. al: recover selected branch mis-prediction, HPCA04.

15

Conclusion

• Categorize CI to CI-DI and CI-DD

• Potential power saving for bypassing CI and CI-DI instructions up-to 12%

• High sensitivity to RUU size

• High sensitivity to execution bandwidth

• Little sensitivity to branch predictor size

16

Question

Thank you

17

top related