l9 : low power dsp
DESCRIPTION
L9 : Low Power DSP. Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. http://vada.skku.ac.kr. Low Power DSP. 수행시간의 대부분이 DO-LOOP 에서 이루어짐. VSELP Vocoder: 83.4 % 2D 8x8 DCT: 98.3 % LPC computation: 98.0 %. DO-LOOP 의 Power Minimization ==> DSP 의 Power Minimization. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: L9 : Low Power DSP](https://reader035.vdocuments.pub/reader035/viewer/2022070404/56813b01550346895da39f69/html5/thumbnails/1.jpg)
L9 : Low Power DSP
Jun-Dong ChoSungKyunKwan Univ.
Dept. of ECE, Vada Lab. http://vada.skku.ac.kr
![Page 2: L9 : Low Power DSP](https://reader035.vdocuments.pub/reader035/viewer/2022070404/56813b01550346895da39f69/html5/thumbnails/2.jpg)
Low Power DSP
• 수행시간의 대부분이 DO-LOOP 에서 이루어짐 VSELP Vocoder : 83.4 %
2D 8x8 DCT : 98.3 %LPC computation : 98.0 %
DO-LOOP 의 Power Minimization ==> DSP 의 Power Minimization
VSELP : Vector Sum Excited Linear PredictionLPC : Linear Prediction Coding
![Page 3: L9 : Low Power DSP](https://reader035.vdocuments.pub/reader035/viewer/2022070404/56813b01550346895da39f69/html5/thumbnails/3.jpg)
VLSI Signal Processing Design Methodology
• pipelining, parallel processing, retiming, folding, unfolding, look-ahead, relaxed look-ahead, and approximate filtering
• bit-serial, bit-parallel and digit-serial architectures, carry save architecture
• redundant and residue systems• Viterbi decoder, motion compensation, 2D-filterin
g, and data transmission systems
![Page 4: L9 : Low Power DSP](https://reader035.vdocuments.pub/reader035/viewer/2022070404/56813b01550346895da39f69/html5/thumbnails/4.jpg)
Loop unrolling
• The technique of loop unrolling replicates the body of a loop some number of times (unrolling factor u) and then iterates by step u instead of step 1. This transformation reduces the loop overhead, increases the instruction parallelism and improves register, data cache or TLB locality.
![Page 5: L9 : Low Power DSP](https://reader035.vdocuments.pub/reader035/viewer/2022070404/56813b01550346895da39f69/html5/thumbnails/5.jpg)
Loop Unrolling Effects
• Loop overhead is cut in half because two iterations are performed in each iteration.
• If array elements are assigned to registers, register locality is improved because A(i) and A(i +1) are used twice in the loop body.
• Instruction parallelism is increased because the second assignment can be performed while the results of the rst are being stored and the loop variables are being updated.
![Page 6: L9 : Low Power DSP](https://reader035.vdocuments.pub/reader035/viewer/2022070404/56813b01550346895da39f69/html5/thumbnails/6.jpg)
Loop Unrolling (IIR filter example) loop unrolling : localize the data to reduce the activity of the inputs of the functio
nal units or two output samples are computed in parallel based on two input samples.
Neither the capacitance switched nor the voltage is altered. However, loop unrolling enables several other transformations (distributivity, constant propagation, and pipelining). After distributivity and constant propagation,
The transformation yields critical path of 3, thus voltage can be dropped.
)( 211
211
nnnnnn
nnn
YAXAXYAXY
YAXY
22
1
211
nnnn
nnn
YAYAXY
YAXY
![Page 7: L9 : Low Power DSP](https://reader035.vdocuments.pub/reader035/viewer/2022070404/56813b01550346895da39f69/html5/thumbnails/7.jpg)
Loop Unrolling for Low Power
![Page 8: L9 : Low Power DSP](https://reader035.vdocuments.pub/reader035/viewer/2022070404/56813b01550346895da39f69/html5/thumbnails/8.jpg)
Loop Unrolling for Low Power
![Page 9: L9 : Low Power DSP](https://reader035.vdocuments.pub/reader035/viewer/2022070404/56813b01550346895da39f69/html5/thumbnails/9.jpg)
Loop Unrolling for Low Power
![Page 10: L9 : Low Power DSP](https://reader035.vdocuments.pub/reader035/viewer/2022070404/56813b01550346895da39f69/html5/thumbnails/10.jpg)
Loop Unrolling for OPR
![Page 11: L9 : Low Power DSP](https://reader035.vdocuments.pub/reader035/viewer/2022070404/56813b01550346895da39f69/html5/thumbnails/11.jpg)
DFG after Loop Unrolling
The estimated power-consumption reduction is now:
obtaining a reduction of 9.4%.
![Page 12: L9 : Low Power DSP](https://reader035.vdocuments.pub/reader035/viewer/2022070404/56813b01550346895da39f69/html5/thumbnails/12.jpg)
Effective Resource Utilization
+
+
+
+
D
D
S
5 1 2
3 4
6
7
Retiming
D
D
D
D
D+
+
+
+S
51 2 6
7
43
Before AFTER
CYCLE Multipliers1 1, 3
2, 4
-
-5
6, 8
7
2
13
4
Adder8
6
7
5
Adder Multipliers
2
1
1
1
-
Can reducd interconnect capacitance.
![Page 13: L9 : Low Power DSP](https://reader035.vdocuments.pub/reader035/viewer/2022070404/56813b01550346895da39f69/html5/thumbnails/13.jpg)
Pipelining
![Page 14: L9 : Low Power DSP](https://reader035.vdocuments.pub/reader035/viewer/2022070404/56813b01550346895da39f69/html5/thumbnails/14.jpg)
Switching Activity Reduction(a) Average activity in a multiplier as a function of the constant value
(b) A parallel and serial implementations of an adder tree.
![Page 15: L9 : Low Power DSP](https://reader035.vdocuments.pub/reader035/viewer/2022070404/56813b01550346895da39f69/html5/thumbnails/15.jpg)
Associativity Transformation
![Page 16: L9 : Low Power DSP](https://reader035.vdocuments.pub/reader035/viewer/2022070404/56813b01550346895da39f69/html5/thumbnails/16.jpg)
Interlaced Accumulation Programming for LowPower
![Page 17: L9 : Low Power DSP](https://reader035.vdocuments.pub/reader035/viewer/2022070404/56813b01550346895da39f69/html5/thumbnails/17.jpg)
Associativity Transformation
![Page 18: L9 : Low Power DSP](https://reader035.vdocuments.pub/reader035/viewer/2022070404/56813b01550346895da39f69/html5/thumbnails/18.jpg)
FIR Parallelization
Mahesh Mejendale, Sunil D. Sherlekar, G. Venkatesh “Low-Power Realization of FIR Filters on Programmable DSP’s” IEEE Transations on very large scale integration (VLSI) system, Vol. 6, No. 4, December 1998
![Page 19: L9 : Low Power DSP](https://reader035.vdocuments.pub/reader035/viewer/2022070404/56813b01550346895da39f69/html5/thumbnails/19.jpg)
FIR PARALLELIZATION
![Page 20: L9 : Low Power DSP](https://reader035.vdocuments.pub/reader035/viewer/2022070404/56813b01550346895da39f69/html5/thumbnails/20.jpg)
FIR Filter Parallelization
![Page 21: L9 : Low Power DSP](https://reader035.vdocuments.pub/reader035/viewer/2022070404/56813b01550346895da39f69/html5/thumbnails/21.jpg)
FIR parallelization: two working phases
![Page 22: L9 : Low Power DSP](https://reader035.vdocuments.pub/reader035/viewer/2022070404/56813b01550346895da39f69/html5/thumbnails/22.jpg)
IIR filter recursive function
![Page 23: L9 : Low Power DSP](https://reader035.vdocuments.pub/reader035/viewer/2022070404/56813b01550346895da39f69/html5/thumbnails/23.jpg)
Recursive Function
![Page 24: L9 : Low Power DSP](https://reader035.vdocuments.pub/reader035/viewer/2022070404/56813b01550346895da39f69/html5/thumbnails/24.jpg)
Interlaced Accumulation Programming for LowPower
![Page 25: L9 : Low Power DSP](https://reader035.vdocuments.pub/reader035/viewer/2022070404/56813b01550346895da39f69/html5/thumbnails/25.jpg)
Optimizing Power using Transformation
LOCAL TRANSFORMATIONPRIMITIVESAssociativity,Distributivity,
Retiming,Common Sub-expression
GLOBALTRANSFORMATION
PRIMITIVESRetiming,
Pipelining,Look-Ahead,Associativity
SEARCH MECHANISMsimulated Rejectionless,
Steepest Decent,Heuristics
POWERESTIMATION
INPUT FLOWGRAPH OUTPUT FLOWGRAPH
![Page 26: L9 : Low Power DSP](https://reader035.vdocuments.pub/reader035/viewer/2022070404/56813b01550346895da39f69/html5/thumbnails/26.jpg)
Data- flow based transformations
• Tree Height reduction.• Constant and variable propagation.• Common subexpression elimination.• Code motion• Dead-code elimination
• The application of algebraic laws such as commutability, distributivity and associativity.
• Most of the parallelism in an algorithm is embodied in the loops.
• Loop jamming, partial and complete loop unrolling, strength reduction and loop retiming and software pipelining.
• Retiming: maximize the resource utilization.
![Page 27: L9 : Low Power DSP](https://reader035.vdocuments.pub/reader035/viewer/2022070404/56813b01550346895da39f69/html5/thumbnails/27.jpg)
Tree-height reduction•Example of tree-height reduction using commutativity and associativity
• Example of tree-height reduction using distributivity
![Page 28: L9 : Low Power DSP](https://reader035.vdocuments.pub/reader035/viewer/2022070404/56813b01550346895da39f69/html5/thumbnails/28.jpg)
Sub-expression elimination
• Logic expressions:– Performed by logic optimization.– Kernel-based methods.
• Arithmetic expressions:– Search isomorphic patterns in the parse trees.– Example:– a= x+ y; b = a+ 1; c = x+ y;– a= x+ y; b = a+ 1; c = a;
![Page 29: L9 : Low Power DSP](https://reader035.vdocuments.pub/reader035/viewer/2022070404/56813b01550346895da39f69/html5/thumbnails/29.jpg)
Examples of other transformations
• Dead-code elimination:– a= x; b = x+ 1; c = 2 * x;– a= x; can be removed if not referenced.
• Operator-strength reduction:– a= x2 ; b = 3 * x;– a= x * x; t = x<<1; b = x+ t;
• Code motion:– for ( i = 1; i < a * b) { } – t = a * b; for ( i = 1; i < t) { }
![Page 30: L9 : Low Power DSP](https://reader035.vdocuments.pub/reader035/viewer/2022070404/56813b01550346895da39f69/html5/thumbnails/30.jpg)
Control- flow based transformations
• Model expansion.– Expand subroutine flatten hierarc
hy.– Useful to expand scope of other
optimization techniques.– Problematic when routine is call
ed more than once.– Example:– x= a+ b; y= a * b; z = foo( x, y) ;– foo( p, q) {t =q-p; return(t);} – By expanding foo:– x= a+ b; y= a * b; z = y-x;
• Conditional expansion • Transform conditional into parallel execution with test at the end.• Useful when test depends on late signals.• May preclude hardware sharing.• Always useful for logic expressions.• Example:•y= ab; if ( a) x= b+d; else x= bd; can be expanded to: x= a( b+ d) + a’bd;•y= ab; x= y+ d( a+ b);
![Page 31: L9 : Low Power DSP](https://reader035.vdocuments.pub/reader035/viewer/2022070404/56813b01550346895da39f69/html5/thumbnails/31.jpg)
Strength reduction
++
*
**
B
X
XX
A
+*+
+* +++
+
X
A
X B
X 2 + AX + B X(X + A) + B
X
A
+* +
+*
*X
X
X
C
*++* +++ +
X B
+*
BX
X
A
![Page 32: L9 : Low Power DSP](https://reader035.vdocuments.pub/reader035/viewer/2022070404/56813b01550346895da39f69/html5/thumbnails/32.jpg)
Strength Reduction
![Page 33: L9 : Low Power DSP](https://reader035.vdocuments.pub/reader035/viewer/2022070404/56813b01550346895da39f69/html5/thumbnails/33.jpg)
DIGLOG multiplier
C n n C n n
A A B B
A B A B B A A B
mult add
jR
kR
jR
kR
jR
kR R R
( ) , ( ) ,
,
( )( )
253 214
2 2
2 2 2 2
2 where n world length in bits
1st Iter 2nd Iter 3rd Iter
Worst-case error -25% -6% -1.6%
Prob. of Error<1% 10% 70% 99.8%
With an 8 by 8 multiplier, the exact result can be obtained at a maximum of seven iteration steps (worst case)
![Page 34: L9 : Low Power DSP](https://reader035.vdocuments.pub/reader035/viewer/2022070404/56813b01550346895da39f69/html5/thumbnails/34.jpg)
Logarithmic Number System
L x
L L L L L L
L L L L
x
AB A B A B A B
A A A A
log | |,
, ,
, ,/
2
2 1 1--> Significant Strength Reduction