SungKyunKwan Univ.
1VADA Lab.
L38: Viterbi Decoder 저전력 설계
성균관대학교 전기전자 및 컴퓨터공학부
조 준 동
SungKyunKwan Univ.
2VADA Lab.
Viterbi Decoder◈ Convolutional Encoder
K = 3 (Constraint Length) R = 1/2 (Rate)
+ +
+
Informationsequence
U uj a j b j
A1 A0
V
C odeword
a j=uj+uj- 1+uj- 2
b j=uj+uj- 2
A(3,1/ 2) C onvolutional encoder
SungKyunKwan Univ.
3VADA Lab.
Viterbi Decoder
00
10
01
11
00
11
00
11
.......
10
01
00
11
10
01
10
01
00
11
State
Time 0 1 65432
Fig. 2. Trellis diagram for a (2,1/ 2) convolutional code
Information sequence : U = (0,0,1,0,1,0,...) Output codeword : V = (00,00,11,10,00,10,...)
SungKyunKwan Univ.
4VADA Lab.
◈ Viterbi Decoder
Viterbi Decoder
BMU SMUAC SU
PMM
Rec eivedSignal
BM SP Dec odedData
Viterbi decoder struc ture
SungKyunKwan Univ.
5VADA Lab.
Branch Metric Unit(BMU) : The branch metrics measure the difference the received symbol and the symbol that causes the transitions between states in the trellis.
Add-Compare-Select Unit(ACSU) : To find the survivor path entering each state, the branch metric of a given transition is added to its corresponding partial path metric(PM) stored in the path metric memory (PMM). This new partial path metric is compared with all the other new partial metric corresponding to all the other transitions entering that state. The transition that has the minimum partial path metric is chosen to be the survivor path of the state. The path metric of the survivor path of each state is updated and stored back into the PMM.
Survivor memory Unit(SMU) : The survivor path are stored in the SMU. A traceback mechanism is applied on the SMU during the decoding stage to output the decoded data.
Viterbi Decoder
SungKyunKwan Univ.
6VADA Lab.
⑴ Low power ACSU VLSI architecture▶ Conventional ACSU VLSI architecture
Butterfly structure
Viterbi Decoder
s a
sb sb
s aS0
S0
S1
S0
SungKyunKwan Univ.
7VADA Lab.
Viterbi Decoder
Architecture of conventional ACSU
Adder
Adder
C omp
Adder
Adder
C omp
(sa,S0)
BM i
PM i- 1
BM i
BM i
PM i- 1
BM i
(sa)
(sb,S1)
(sb)
(sa,S1)
(sb,S0)
M i
M i
(S0)
(S1)
SungKyunKwan Univ.
8VADA Lab.
― Algorithm
Viterbi Decoder [SKKU. Solution]
☞ The area and power of the lower power ACSU design are reduced by
20% and 30%, respectively, comparing with the conventional ACSU
design
>PM i- 1(sa) (sa,S0)
BMi+ BMi(sb,S0)
PM i- 1(sb)
+
>PM i- 1(sa)
PM i- 1(sb)
-(sa,S0)
BMiBMi(sb,S0)
-
SungKyunKwan Univ.
9VADA Lab.
▶ Low power ACSU VLSI architecture [C-Y Tsui, ISLPED’99]
Viterbi Decoder [SKKU. Solution]
SungKyunKwan Univ.
10VADA Lab.
※ Glitch minimization [Raghunathan, DAC’96]
(a) Lower power ACSU architecture (b) Conventional ACSU architecture
☞ The power consumption of architecture (a) is larger than that of architecture (b) by more than 17% because of glitch power dissipation
Viterbi Decoder [SKKU. Solution]
Y
X
+
+
0
1
<
A
B
D
C
(a) compare- add (b) add- compare
+
0
1
0
1
<
A
B
D
C
Y
X
SungKyunKwan Univ.
11VADA Lab.
※ Glitches in control logic
Viterbi Decoder [SKKU. Solution]
C LK
+
0
1
0
1
<
A
B
D
C
Y
X&
S
C
D
S
Fs=0 Fs=1 = A B. .
SungKyunKwan Univ.
12VADA Lab.
⑵ Low power traceback VLSI architecture▶ Systolic Viterbi, traceback decoder[J. Sparso’91]
Viterbi Decoder
ACSUTrace-BackUnit
1
Trace-BackUnit
2
Trace-BackUnit
3
Trace-BackUnit10
.....
Trace- Back Units
The struc ture of systolic Viterbi decoder
SungKyunKwan Univ.
13VADA Lab.
Viterbi Decoder
00
10
01
11
.......
0
10
State
Time 0 1 65432
2 2
2
2
1
3
3
2
2
1
2
2
4
1
1
2
2
3
3
2
3
1
1
1
0
1
1
0
0
0
0
0
0
0
0
0
0
1
0
0
0
1
0
path metric
dec ision vector
Sequence of staes of the trace- back methode
Received codeword : V = (00,00,11,10,00,10,...)
SungKyunKwan Univ.
14VADA Lab.
Viterbi DecoderTime unit
ACSU
0000
00XX
ACSU
00XX
2
1
ACSU
0000
00XX
3
0000
ACSU
0000
0000
00XX
4
1101
dec ision vec tor state with smallest path metric
SungKyunKwan Univ.
15VADA Lab.
Viterbi Decoder....
Time unit
ACSU
1000
0000
0100
1011
0000
1101
1101
0000
0000
00xx
11
10
ACSU
1000
0000
0100
1011
0000
1101
1101
0000
0000
00xx
01 10
11
1110
ACSU
1000
0000
0100
1011
0000
1101
1101
0000
0000
00xx
0100
10 11 00
12
1110
survivor depth = 5K
T10 T1T2T3T4T5T6T7T8T9
T10 T1T2T3T4T5T6T7T8T9
T11
"0"
"1"
10
11
SungKyunKwan Univ.
16VADA Lab.
Viterbi Decoder
ACSU
1111
0000
0000
1011
0001
1000
1101
0001
0100
1110
1000
0000
0100
1011
0000
01 00 10 11 00 01 1
24
ACSU
1110
1000
0000
0100
1011
0000
1101
1101
0000
0000
00xx
0000
0000
1011
0001
1000
1101
0001
0100
11 00 10 01 01 10 00 10 10 00
19
ACSU
1111
0000
0000
1011
0001
1000
1101
0001
0100
1110
1000
0000
0100
1011
0000
1101
1101
0000
0000
01 10 01 00 10 11 00 01 01 00 0
20
.
.
.
.
.
.
SungKyunKwan Univ.
17VADA Lab.
※ Systolic array decoder 의 문제점
The systolic array viterbi decoder is organized to input the decision vector and the smallest path metric out of the ACSU and to output the decode bit by shifting every register for every cycle.
This system consumes a great dynamic power consumption due to switching activities of registers which is almost 80% of the total power consumption because every data in TBU shifts for every cycle.
Viterbi Decoder
SungKyunKwan Univ.
18VADA Lab.
Viterbi Decoder [SKKU. Solution]
▶ Our low power trace-back unit
C ONTROL BLOC K
0000
0000
00XX
C ONTROL BLOC K
ACSU
C ONTROL BLOC K
Time unit
1
3
2
0
000
ACSU
ACSU
00XX
00XX
SungKyunKwan Univ.
19VADA Lab.
Viterbi Decoder [SKKU. Solution]
C ONTROL BLOC K
0000
1101
0000
1101
0000
1011
0100
0000
ACSU 9
00XX
C ONTROL BLOC K
0000
1101
0000
1101
0000
1011
0100
0000
1000
11
ACSU 10
00XX
C ONTROL BLOC K
0000
1101
0000
1101
0000
1011
0100
0000
10
1000
ACSU 11
1110
01
00XX
.
.
.T1 T9T8T7T6T5T4T3T2
Trace- back
SungKyunKwan Univ.
20VADA Lab.
Viterbi Decoder [SKKU. Solution]
.
.
.
.
C ONTROL BLOC K
0000
10
1101
0000
1101
10
0000
1011
00
0100
0000
10
1000
0100
0000
1011
00
0001
1000
10
1101
0001
01
0000
11
ACSU 19
1110
01
00XX
00
C ONTROL BLOC K
0000
1101
01
0000
00
1101
01
0000
1011
0100
00
0000
1000
11
0100
10
0000
10
1011
0001
01
1000
1101
00
0001
1111
01
0000
0ACSU 20
1110
C ONTROL BLOC K
0000
10
1101
1101
10
0000
1011
00
0100
0000
10
1000
0100
0000
1011
00
0001
1000
10
1101
0001
01
1111
0000
110
ACSU 21
1110
01
SungKyunKwan Univ.
21VADA Lab.
After decision vector and the smallest path metric generated from ACSU
are transferred to the Control Block (CB), the CB outputs the decision ve
ctor and the smallest path metric with the right cycle using a counter and
a multiplexer.
The register array, which stores the value of trace-back from the CB, was
provided to finally output decoded bit, not by shifting all higher 4-bit d
ecision vector as in the classical TBU, but by shifting the lower 2-bit
only, which is the smallest path metric, to the left
Viterbi Decoder [SKKU. Solution]
SungKyunKwan Univ.
22VADA Lab.
◈ Experimental Result (area 11% , power 40% )
Viterbi Decoder [SKKU. Solution]
A r e a
0
1000
2000
3000
4000
5000
6000
7000
8000
2 3 4
K
gate
s
Trace- back Unit Low Power Trace- back Unit
Power Dissipation
0
200
400
600
800
1000
1200
1400
1600
2 3 4
K
pow
er(
uW
)
Trace- back Unit Low Power Trace- back Unit
SungKyunKwan Univ.
23VADA Lab.
⑶ Low Power Asynchronous Viterbi Decoder [Y.h.Lee , Stanford] ▶ Algorithm
Viterbi Decoder [Stanford Solution]
time ntime
n+1
Traceback processing
converge point
SungKyunKwan Univ.
24VADA Lab.
① 초기화 : 구속장의 5 배의 trellis 를 traceback 하고 , 그 경로를 저장한다 .
② Loop
A. 추적과 비교 : 임의의 초기 스테이트를 선택해 trace back 을 시작
한다 . 동시에 , route 를 추적해 나가면서 각 node 에서
저장된 route 와 비교한다 .
B. 비교 값이 같으면 추적을 멈추고 저장된 route 를 버린다 . 같지 않
을 때는 A 과정을 반복한다 .
③ 각각의 입력 신호에 대해 ② 과정을 반복한다 .
Viterbi Decoder [Stanford Solution]
SungKyunKwan Univ.
25VADA Lab.
▶ Implementation
Self-timed TBU block diagram
Viterbi Decoder [Stanford Solution]
Previous path
Input Port
AddressRD/WR Control
Shift ReisterMUX
M em ory M anagem entUnitAddress RD/WR
Control
SurvivingPath
M em ory
Self-precharge &Self-requesting
if not found
TraceBackUnit Oscillator
RingComparison
Logic
Requestif Path is not
found
RequestformACS
Acknowledge toACS
if path is found
SungKyunKwan Univ.
26VADA Lab.
① Self-timed TBU 가 request 신호를 기다리는 동안 전력 소모가 없다 .
② ACS 는 스테이트 결정 데이터를 버리기 위해 request 신호를 내보낸
다 .③ TBU 는 이전의 surviving path memory 와 previous path memory 를 읽어 들여 비
교한다 .
④ 같지 않으면 , TBU 는 previous path memory 를 update 하고 self- precharging, self-requesting 을 한 다음 ③ 과정을 반복한다 . 같으면 , ⑤ 과정으로 간다 .
⑤ TBU 는 ACS 에 scknowledgement 신호를 보내고 , 다음 ACS 의 request
신호를 위해 self-precharge 한다 .
Viterbi Decoder
SungKyunKwan Univ.
27VADA Lab.
Low-Power Bit-Serial Viterbi DecoderH. Suzuki, Y. N. Chang, K. K. Parhi “Low-Power Bit-Serial Viterbi Decoder for 3rd Generation W-CDMA System”, 1
999, CICC
◈ Abstract
This paper presents a low-power bit-serial Viterbi decoder chip with the coding rate =1/3 and the constraint length K=9(256 states)
The Add-Compare-Select(ACS) units have been designed using bit-serial arithmetic and a power efficient trace-back scheme and an application-specific memory have been developed for the trace-back operation.
The chip was implemented using 0.5m CMOS technology and is operative at 20Mbps under 3.3V and 2Mbps under 1.8V. The power dissipation is only 9.8mW at 2Mbps operation under 1.8V
SungKyunKwan Univ.
28VADA Lab.
Low-Power Bit-Serial Viterbi Decoder
◈ Architecture Overview
256 bit-serial ACS units are placed in parallel and each ACS unit include state metrics storage
Trace-back block, a 256 x 48 bit memory is required for the survivor path length of 48
SungKyunKwan Univ.
29VADA Lab.
Low-Power Bit-Serial Viterbi Decoder
Bit-Serial Viterbi Decoder Chip Diagram
SungKyunKwan Univ.
30VADA Lab.
Low-Power Bit-Serial Viterbi Decoder
◈ Bit-Serial ACS Unit
Bit-serial ACS unit
SungKyunKwan Univ.
31VADA Lab.
Low-Power Bit-Serial Viterbi Decoder
Each ACS unit has three full-adders.
Two of them are used to add the state metric and the branch metric and the third one is used to compare two new state metrics
Reducing the overhead down to 17% of the whole area of the ACS unit
SungKyunKwan Univ.
32VADA Lab.
Low-Power Bit-Serial Viterbi Decoder
◈ Trace Back Strategy
Trace Back operation
SungKyunKwan Univ.
33VADA Lab.
Low-Power Bit-Serial Viterbi Decoder
The memory size required in this paper is twice as large as the minimum memory size(256 x 2).
After 48 “TRACE BACK” operations, 24 decoded bits are obtained consecutively.
Two separate pointers, namely, a read pointer and a write pointer are required and the speed of the read pointer should be three times as fast as that of the write pointer
This operation was implemented with single-port memories using a time-multiplexed access method.