vada lab.sungkyunkwan univ. 1 l5:lower power architecture design 1999. 8.2 성균관대학교 조...

20
SungKyunKwan Univ . 1 VADA Lab. L5:Lower Power Archite cture Design 1999. 8.2 성성성성성성 성 성 성 성성 http://vada.skku.ac.kr

Upload: douglas-wilkins

Post on 19-Jan-2018

215 views

Category:

Documents


0 download

DESCRIPTION

VADA Lab.SungKyunKwan Univ. 3 Architecture-Level Solutions Architecture-Driven Voltage Scaling: Choose more parallel architecture, Lowering V dd reduces energy, but increase delays Regularity: to minimize the power in the control hardware and the interconnection network. Modularity: to exploit data locality through distributed processing units, mem- ories and control. –Spatial locality: an algorithm can be partitioned into natural clusters based on connectivity – Temporal locality:average lifetimes of variables (less temporal storage, probability of future accesses referenced in the recent past). Few memory references: since references to memories are expensive in terms of power. Precompute physical capacitance of Interconnect and switching activity (number of bus accesses

TRANSCRIPT

Page 1: VADA Lab.SungKyunKwan Univ. 1 L5:Lower Power Architecture Design 1999. 8.2 성균관대학교 조 준 동 교수

SungKyunKwan Univ.

1VADA Lab.

L5:Lower Power Architecture Design

1999. 8.2 성균관대학교 조 준 동 교수

http://vada.skku.ac.kr

Page 2: VADA Lab.SungKyunKwan Univ. 1 L5:Lower Power Architecture Design 1999. 8.2 성균관대학교 조 준 동 교수

SungKyunKwan Univ.

2VADA Lab.

Architectural-level Synthesis• Translate HDL models into sequencing graphs. • Behavioral-level optimization:

– Optimize abstract models independently from the implementation parameters.

• Architectural synthesis and optimization:– Create macroscopic structure:

• data-path and control-unit.– Consider area and delay information

• Hardware compilation:– Compile HDL model into sequencing graph.– Optimize sequencing graph.– Generate gate-level interconnection for a cell library. of the

implementation.

Page 3: VADA Lab.SungKyunKwan Univ. 1 L5:Lower Power Architecture Design 1999. 8.2 성균관대학교 조 준 동 교수

SungKyunKwan Univ.

3VADA Lab.

Architecture-Level Solutions• Architecture-Driven Voltage Scaling: Choose more parallel architectu

re, Lowering V dd reduces energy, but increase delays • Regularity: to minimize the power in the control hardware and the interconnect

ion network.• Modularity: to exploit data locality through distributed processing units, mem-o

ries and control. – Spatial locality: an algorithm can be partitioned into natural clusters base

d on connectivity– Temporal locality:average lifetimes of variables (less temporal storage, p

robability of future accesses referenced in the recent past).• Few memory references: since references to memories are expensive in term

s of power. • Precompute physical capacitance of Interconnect and switching activi

ty (number of bus accesses

Page 4: VADA Lab.SungKyunKwan Univ. 1 L5:Lower Power Architecture Design 1999. 8.2 성균관대학교 조 준 동 교수

SungKyunKwan Univ.

4VADA Lab.

Power Measure of P

Page 5: VADA Lab.SungKyunKwan Univ. 1 L5:Lower Power Architecture Design 1999. 8.2 성균관대학교 조 준 동 교수

SungKyunKwan Univ.

5VADA Lab.

Architecture Trade-offReference Data Path

Page 6: VADA Lab.SungKyunKwan Univ. 1 L5:Lower Power Architecture Design 1999. 8.2 성균관대학교 조 준 동 교수

SungKyunKwan Univ.

6VADA Lab.

Parallel Data Path

Page 7: VADA Lab.SungKyunKwan Univ. 1 L5:Lower Power Architecture Design 1999. 8.2 성균관대학교 조 준 동 교수

SungKyunKwan Univ.

7VADA Lab.

Pipelined Data Path

Page 8: VADA Lab.SungKyunKwan Univ. 1 L5:Lower Power Architecture Design 1999. 8.2 성균관대학교 조 준 동 교수

SungKyunKwan Univ.

8VADA Lab.

A Simple Data Path, Result4

Page 9: VADA Lab.SungKyunKwan Univ. 1 L5:Lower Power Architecture Design 1999. 8.2 성균관대학교 조 준 동 교수

SungKyunKwan Univ.

9VADA Lab.

Uni-processor Implementation

Page 10: VADA Lab.SungKyunKwan Univ. 1 L5:Lower Power Architecture Design 1999. 8.2 성균관대학교 조 준 동 교수

SungKyunKwan Univ.

10VADA Lab.

Multi-Processor Implementation

Page 11: VADA Lab.SungKyunKwan Univ. 1 L5:Lower Power Architecture Design 1999. 8.2 성균관대학교 조 준 동 교수

SungKyunKwan Univ.

11VADA Lab.

Datapath Parallelization

Page 12: VADA Lab.SungKyunKwan Univ. 1 L5:Lower Power Architecture Design 1999. 8.2 성균관대학교 조 준 동 교수

SungKyunKwan Univ.

12VADA Lab.

Memory Parallelization

At first order P= C * f/2 * Vdd2

Page 13: VADA Lab.SungKyunKwan Univ. 1 L5:Lower Power Architecture Design 1999. 8.2 성균관대학교 조 준 동 교수

SungKyunKwan Univ.

13VADA Lab.

VLIW Architecture

Page 14: VADA Lab.SungKyunKwan Univ. 1 L5:Lower Power Architecture Design 1999. 8.2 성균관대학교 조 준 동 교수

SungKyunKwan Univ.

14VADA Lab.

VLIW - cont.• Compiler takes the responsibility for finding the operations that can be is

sued in parallel and creating a single very long instruction containing these operations. VLIW instruction decoding is easier than superscalar instruction due to the fixed format and to no instruction dependency.

• The fixed format could present more limitations to the combination of operations.

• Intel P6: CISC instructions are combined on chip to provide a set of micro-operations (i.e., long instruction word) that can be executed in parallel.

• As power becomes a major issue in the design of fast -Pro, the simple is the better architecture.

• VLIW architecture, as they are simpler than N-issue machines, could be considered as promising architectures to achieve simultaneously

• high-speed and low-power.

Page 15: VADA Lab.SungKyunKwan Univ. 1 L5:Lower Power Architecture Design 1999. 8.2 성균관대학교 조 준 동 교수

SungKyunKwan Univ.

15VADA Lab.

Synchronous VS. Asynchronous

• Synchronous system: A signal path starts from a clocked flip- flop through combinational gates and ends at another clocked flip- flop. The clock signals do not participate in computation but are required for synchronizing purposes. With advancement in technology, the systems tend to get bigger and bigger, and as a result the delay on the clock wires can no longer be ignored. The problem of clock skew is thus becoming a bottleneck for many system designers. Many gates switch unnecessarily just because they are connected to the clock, and not because they have to process new inputs. The biggest gate is the clock driver itself which must switch.

• Asynchronous system (self-timed): an input signal (request) starts the computation on a module and an output signal (acknowledge) signifies the completion of the computation and the availability of the requested data. Asynchronous systems are potentially response to transitions on any of their inputs at anytime, since they have no clock with which to sample their inputs.

Page 16: VADA Lab.SungKyunKwan Univ. 1 L5:Lower Power Architecture Design 1999. 8.2 성균관대학교 조 준 동 교수

SungKyunKwan Univ.

16VADA Lab.

Asynchronous - Cont.

• More difficult to implement, requiring explicit synchronization between communication blocks without clocks

• If the signal feeds directly to conventional gate-level circuitry, invalid logic levels could propagate throughout the system.• Glitches, which are filtered out by the clock in synchronous designs, ma

y cause an asynchronous design to malfunction.• Asynchronous designs are not widely used, designers can't find the su

pporting design tools and methodologies they need.• DCC Error Corrector of Compact cassette player saves power of 80%

as compared to the synchronous counterpart.• Offers more architectural options/freedom encourages distributed, loc

alized control offers more freedom to adapt the supply voltage

S. Furber, M. Edwards. “Asynchronous Design Methodologies”. 1993

Page 17: VADA Lab.SungKyunKwan Univ. 1 L5:Lower Power Architecture Design 1999. 8.2 성균관대학교 조 준 동 교수

SungKyunKwan Univ.

17VADA Lab.

Asynchronous design with adaptive scaling of the supply voltage

(a) Synchronous system

(b) Asynchronous system with adaptive scaling of the supply voltage

Page 18: VADA Lab.SungKyunKwan Univ. 1 L5:Lower Power Architecture Design 1999. 8.2 성균관대학교 조 준 동 교수

SungKyunKwan Univ.

18VADA Lab.

Asynchronous Pipeline

Page 19: VADA Lab.SungKyunKwan Univ. 1 L5:Lower Power Architecture Design 1999. 8.2 성균관대학교 조 준 동 교수

SungKyunKwan Univ.

19VADA Lab.

PIPELINED SELF-TIMED micro P

Page 20: VADA Lab.SungKyunKwan Univ. 1 L5:Lower Power Architecture Design 1999. 8.2 성균관대학교 조 준 동 교수

SungKyunKwan Univ.

20VADA Lab.

Hazard-free Circuits

6% more logics