a dynamically reconfigurable processor for h.264/avc image prediction

4
ORIGINAL ARTICLE Y. Hayakawa · A. Kanasugi (*) Graduate School of Engineering, Tokyo Denki University, 2-2 Kanda Nishiki-cho, Chiyoda-ku, Tokyo 101-8457, Japan e-mail: [email protected] This work was presented in part at the 15th International Symposium on Artificial Life and Robotics, Oita, Japan, February 4–6, 2010 Artif Life Robotics (2010) 15:147–150 © ISAROB 2010 DOI 10.1007/s10015-010-0781-z Yuki Hayakawa · Akinori Kanasugi A dynamically reconfigurable processor for H.264/AVC image prediction it is difficult to use such a processor with a portable device. Therefore, an application-specific processor is necessary. H.264/AVC contains intra- and inter-prediction pro- cesses, a de-blocking filter process, a quantization process, an integer discrete cosine transform process, an encoding process, a decoding process, an inverse quantization process, and an inverse integer discrete cosine transform process. The intra- and inter-prediction processes are not used at the same time. The intra- and inter-prediction process circuits are implemented independently by the general decoder. Dynamic reconfiguration can expand virtually all the circuit area in a limited chip area. Although reconfiguration requires a temporary break in the circuit for a few millisec- onds, dynamic reconfiguration changes the circuit construc- tion during operation without a break in the circuit.Therefore, we can design a circuit with many functions in a small circuit. 3,4 Therefore, this article proposes a dynamically reconfigu- rable processor for H.264/AVC main profile image prediction. 2 H.264/AVC main profile intra- and inter-prediction H.264/AVC contains intra- and inter-prediction processes, a de-blocking filter process, a quantization process, an integer discrete cosine transform process, an encoding process, a decoding process, an inverse quantization process, and an inverse integer discrete cosine transform process. Intra- and inter-prediction processes are not used at the same time by a general decoder. The intra-prediction process uses neigh- boring samples for an N × N block (for example 4 × 4, 16 × 16, etc.). The inter-prediction process uses reference pic- tures (namely, before and after the current picture). Intra- and inter-prediction process circuits are implemented independently. 2.1 Inter-prediction process Most inter-prediction processes are a sample interpolation process. Luminance (luma) sample interpolation processes Abstract H.264/AVC provides high video quality at sub- stantially low bit rates. It is useful for saving and transferring video images by robot cameras. However, the computa- tional complexity of H.264/AVC is very high. A high-speed general-purpose processor is necessary to process H.264/ AVC. However, it is difficult to use such a processor for a portable device. Therefore, an application-specific processor is necessary. A dynamic reconfiguration can virtually expand the circuit area in a limited chip area. Therefore, this article proposes a dynamically reconfigurable processor for H.264/ AVC image prediction. H.264/AVC contains intra- and inter-prediction processes. The intra- and inter-prediction processes are not used at the same time. The proposed pro- cessor was designed and synthesized, and dynamically reconfigures those circuits. As a result, look-up tables (LUTs) were reduced to 93%, flip-flops were reduced to 94%, and the maximum delay was about the same. Key words H.264/AVC · Dynamic reconfiguration · Inter- prediction · Intra-prediction 1 Introduction H.264/AVC is the latest video compression standard. 1 H.264/AVC provides high video quality at substantially low bit rates. It is useful for saving and transferring video images by robot cameras. However, the computational complexity of H.264/AVC is very high. 2 The video resolution is propor- tional to the frame rate of application. Furthermore, video resolution increases every year. A high-speed general-pur- pose processor is necessary to process H.264/AVC. However, Received and accepted: April 11, 2010

Upload: yuki-hayakawa

Post on 15-Jul-2016

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: A dynamically reconfigurable processor for H.264/AVC image prediction

ORIGINAL ARTICLE

Y. Hayakawa · A. Kanasugi (*)Graduate School of Engineering, Tokyo Denki University, 2-2 Kanda Nishiki-cho, Chiyoda-ku, Tokyo 101-8457, Japane-mail: [email protected]

This work was presented in part at the 15th International Symposium on Artifi cial Life and Robotics, Oita, Japan, February 4–6, 2010

Artif Life Robotics (2010) 15:147–150 © ISAROB 2010DOI 10.1007/s10015-010-0781-z

Yuki Hayakawa · Akinori Kanasugi

A dynamically reconfi gurable processor for H.264/AVC image prediction

it is diffi cult to use such a processor with a portable device. Therefore, an application-specifi c processor is necessary.

H.264/AVC contains intra- and inter-prediction pro-cesses, a de-blocking fi lter process, a quantization process, an integer discrete cosine transform process, an encoding process, a decoding process, an inverse quantization process, and an inverse integer discrete cosine transform process. The intra- and inter-prediction processes are not used at the same time. The intra- and inter-prediction process circuits are implemented independently by the general decoder.

Dynamic reconfi guration can expand virtually all the circuit area in a limited chip area. Although reconfi guration requires a temporary break in the circuit for a few millisec-onds, dynamic reconfi guration changes the circuit construc-tion during operation without a break in the circuit. Therefore, we can design a circuit with many functions in a small circuit.3,4

Therefore, this article proposes a dynamically reconfi gu-rable processor for H.264/AVC main profi le image prediction.

2 H.264/AVC main profi le intra- and inter-prediction

H.264/AVC contains intra- and inter-prediction processes, a de-blocking fi lter process, a quantization process, an integer discrete cosine transform process, an encoding process, a decoding process, an inverse quantization process, and an inverse integer discrete cosine transform process. Intra- and inter-prediction processes are not used at the same time by a general decoder. The intra-prediction process uses neigh-boring samples for an N × N block (for example 4 × 4, 16 × 16, etc.). The inter-prediction process uses reference pic-tures (namely, before and after the current picture). Intra- and inter-prediction process circuits are implemented independently.

2.1 Inter-prediction process

Most inter-prediction processes are a sample interpolation process. Luminance (luma) sample interpolation processes

Abstract H.264/AVC provides high video quality at sub-stantially low bit rates. It is useful for saving and transferring video images by robot cameras. However, the computa-tional complexity of H.264/AVC is very high. A high-speed general-purpose processor is necessary to process H.264/AVC. However, it is diffi cult to use such a processor for a portable device. Therefore, an application-specifi c processor is necessary. A dynamic reconfi guration can virtually expand the circuit area in a limited chip area. Therefore, this article proposes a dynamically reconfi gurable processor for H.264/AVC image prediction. H.264/AVC contains intra- and inter-prediction processes. The intra- and inter-prediction processes are not used at the same time. The proposed pro-cessor was designed and synthesized, and dynamically reconfi gures those circuits. As a result, look-up tables (LUTs) were reduced to 93%, fl ip-fl ops were reduced to 94%, and the maximum delay was about the same.

Key words H.264/AVC · Dynamic reconfi guration · Inter-prediction · Intra-prediction

1 Introduction

H.264/AVC is the latest video compression standard.1 H.264/AVC provides high video quality at substantially low bit rates. It is useful for saving and transferring video images by robot cameras. However, the computational complexity of H.264/AVC is very high.2 The video resolution is propor-tional to the frame rate of application. Furthermore, video resolution increases every year. A high-speed general-pur-pose processor is necessary to process H.264/AVC. However,

Received and accepted: April 11, 2010

Page 2: A dynamically reconfigurable processor for H.264/AVC image prediction

148

and chrominance (chroma) sample interpolation processes are different. Although the luma sample interpolation process calculates quarter samples using a 6-tap fi lter, the chroma sample interpolation process calculates 1/8 samples. The luma sample interpolation process needs 448 additions for a 4 × 4 block. The chroma sample interpolation process needs 96 additions for a 2 × 2 block. Figure 1 shows the luma sample interpolation circuit.

The 6-tap fi lter for the luma sample interpolation process is calculated as follows:

p A B C D E FA E

p p

t

t

1

1 1

5 20 20 58

16

= + + +

= +( )

- -bit integer samples

Clip∼ :

332( )

⎧⎨⎪

⎩⎪

(1)

(2)

p G H I J K LG L p

p p

t

t

2

1

2

5 20 20 515

= + + +( )

=

- -bit filtered samples

Clip∼ :

tt2 512 1024+( )( )

⎧⎨⎪

⎩⎪

(3)

(4)

Clip p p

p

p

p

( ) =<

≤ ≤>

⎧⎨⎪

⎩⎪,

,

,

0 0

0 255

255 255

(5)

The chroma sample interpolation process for one sample is calculated as follows:

p O x y P x y Q x y R x yO R

t3 8 8 8 88

= −( ) −( ) + ( ) −( ) + −( )( ) + ( )( )· · · ·:∼ bit inteeger samples

Clipp pt3 3 32 64= +( )( )

⎧⎨⎪

⎩⎪

(6)

(7)

2.2 Intra-prediction process

The intra-prediction process uses the top and left neighbor samples. The intra-prediction process consists of the luma intra-prediction process for a 4 × 4 block, the luma intra-prediction process for a 16 × 16 block, and the chroma intra-prediction process for an 8 × 8 block. The luma intra-

prediction process for a 4×4 block contains three calcula-tions, as indicated below.

16 times the sum of 2 values.16 times the sum of 4 values.The sum of 8 values.

The luma intra-prediction process for a 16 × 16 block contains two calculations, as indicated below.

The sum of 32 values.

p x y a b x c yx y

, ,, , , . . . ,

[ ] = + ⋅ −( ) + ⋅ −( ) +( ) >>( )=

Clipwith

7 7 16 50 1 15 (8)

a p p= −[ ]+ −[ ]( )16 1 15 15 1, , (9)

b H= +( ) >>5 32 6 (10)

c V= +( ) >>5 32 6 (11)

H x p x p xx

= ′ +( ) + ′ −[ ]− + ′ −[ ]( )′=∑ 1 8 1 6 1

0

7

, , (12)

V y p y p yy

= ′ +( ) − + ′[ ]− − + ′[ ]( )′=∑ 1 1 8 1 6

0

7

, , (13)

The chroma intra-prediction process contains two calcu-lations, as indicated below.

The sum of 8 values.

p x y a b x c yx y

, ,, , , . . . ,

[ ] = + ⋅ −( ) + ⋅ −( ) +( ) >>( )=

Clipwith

3 3 16 50 1 7

(14)

a p p= −[ ]+ −[ ]( )16 1 7 7 1, , (15)

b H= +( ) >>34 32 6 (16)

c V= +( ) >>34 32 6 (17)

H x p x p xx

= ′ +( ) + ′ −[ ]− + ′ −[ ]( )′=∑ 1 4 1 2 1

0

3

, , (18)

V y p y p yy

= ′ +( ) − + ′[ ]− − + ′[ ]( )′=∑ 1 1 4 1 2

0

3

, , (19)

3 Proposed processor

Intra- and inter-prediction process circuits are implemented independently by a general decoder. Here, the circuit area was reduced by the dynamic reconfi guration of these cir-cuits. The proposed circuit is based on 13 luma sample inter-polation processes, because this process is the largest. This process consists of 91 adders. The connections of the adders are reconfi gured by the multiplexers. Some circuits were not incorporated because the circuit areas had increased. The 70 adders were reduced by reconfi guration.

The dynamically reconfi gurable processor proposed reconfi gures the luma sample interpolation process, the chroma sample interpolation process, the luma intra-predic-tion process, and the chroma intra-prediction process. The luma intra-prediction process has 13 modes. The chroma intra-prediction process has 4 modes. This processer calcu-

Fig. 1. Luma sample interpolation circuit

⎪⎪⎪⎪⎪⎪

⎪⎪⎪⎪⎪⎪

⎪⎪⎪⎪⎪⎪

⎪⎪⎪⎪⎪⎪

Page 3: A dynamically reconfigurable processor for H.264/AVC image prediction

149

The proposed circuit has 13 blocks. The connections of those blocks and adders are reconfi gured by the multiplex-ers. The 13 blocks are numbered. Six blocks are almost the same type (Nos. 6–12). Figure 2 shows a block diagram of those circuits. Figure 3 shows a block diagram of three blocks (Nos. 0, 1, and 2). These blocks are used for the luma sample interpolation process, the luma intra-prediction process, and the chroma intra-prediction process. For example, the luma intra-prediction process for a 16 × 16 block uses shaded adders. Those adders calculate H in Eq. 12. Figure 4 shows a block diagram of three blocks (Nos. 3, 4, and 5). These blocks are almost the same as the blocks described previously (Nos. 0, 1, and 2). However, those can calculate p in Eq. 8. Those blocks are used for the luma sample interpolation process, the luma intra-prediction process, and the chroma intra-prediction process. For example, the luma intra-prediction process for a 16 × 16 block uses shaded adders. Those adders calculate p in Eq. 8. Nine blocks (Nos. 0–8) have 6 × 8 bits inputs at the minimum. However, 4 blocks (Nos. 9–12) have 6 × 15 bit inputs at the minimum. Because processors have to calcu-late pt2 in Eq. 3, in addition, those blocks (Nos. 9–12) calcu-late pt1 in Eq. 1.

The reconfi guration of 13 blocks and the connection of 13 blocks are controlled by a control unit. The inputs of the control unit are sample data, mode (luma intra-prediction process, luma intra-prediction process for a 16 × 16 block, and so on), select signals, and so on. The calculation results are output four pixels (32 bits) at a time.

4 Evaluation

The dynamically reconfi gurable prediction circuit for H.264/AVC decoding was synthesized using Xilinx ISE 11.1 CAD Fig. 2. Block diagram of one block (Nos. 6–12)

Fig. 3. Block diagram of three blocks (Nos. 0, 1, and 2)

lates the luma sample interpolation process for a 4 × 4 block in 19 clock cycles. This processer calculates the chroma sample interpolation process for a 2 × 2 block in 2 clock cycles. This processer calculates the luma intra-prediction process for a 4 × 4 block in 4 clock cycles at the maximum. This processer calculates the luma intra-prediction process for a 16 × 16 block in 66 clock cycles at the maximum. This processer calculates the chroma intra-prediction process for an 8 × 8 block in 18 clock cycles at the maximum.

Page 4: A dynamically reconfigurable processor for H.264/AVC image prediction

150

Fig. 4. Block diagram of three blocks (Nos. 3, 4, and 5)

software. The target fi eld programmable gate array (FPGA) is Virtex-5 of Xilinx Corp. (XC5VLX50T). The result was compared with a circuit without dynamic reconfi guration. These circuits can calculate in the same clock cycles. Table 1 summarizes the logic synthesis results. As a result, look up tables (LUTs) were reduced to 93%, fl ip-fl ops were reduced to 94%, and the maximum delay was about the same.

5 Conclusion

We have proposed a dynamically reconfi gurable processor for H.264/AVC image prediction. The proposed processor contains 13 blocks. The proposed processor reconfi gures the luma sample interpolation process, the chroma sample

interpolation process, and the intra-prediction process. Seventy adders were reduced by reconfi guration. The pro-posed processor was designed and synthesized. The result was compared with a circuit without dynamic reconfi gura-tion. As a result, LUTs were reduced to 93%, fl ip-fl ops were reduced to 94%, and the maximum delay was about the same.

Acknowledgment This work was supported by Tokyo Denki Univer-sity Science Promotion Fund (Q09J-01).

References

1. ITU-T recommendation H.264 (2005) Advanced video coding for generic audiovisual service

2. Chien S, Huang Y, Chen C, et al (2005) Hardware architecture design of video compression for multimedia communication systems. IEEE Commun Mag 43(8):123–132

3. Sato T, Watanabe H, Shiba K (2005) Implementation of dynamically reconfi gurable processor DAPDNA-2. VLSI Design, Automation and Test, 2005 IEEE VLSI-TSA International Symposium, pp 323–324

4. Sugawara T, Ide K, Sato T (2004) Dynamically reconfi gurable pro-cessor implemented with IP Flex’s DAPDNA technology. IEICE Trans Inf Syst E87-D(8):1997–2003

Table 1. Processor synthesis results

LUTs Flip-fl ops Delay (ns)

Proposed 4181 1608 11.678General 4508 1705 11.878Rate 0.927 0.943 0.983