sze-wei_spie2007

Upload: azhar-mahmood

Post on 06-Apr-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/3/2019 Sze-Wei_SPIE2007

    1/12

    Complexity modeling for context-based adaptive binaryarithmetic coding (CABAC) in H.264/AVC decoder

    Szu-Wei Lee and C.-C. Jay Kuo

    Ming Hsieh Department of Electrical Engineering and Signal and Image Processing Institute

    University of Southern California, Los Angeles, CA 90089-2564, USAE-mails: [email protected] and [email protected]

    ABSTRACT

    One way to save the power consumption in the H.264 decoder is for the H.264 encoder to generate decoder-friendly bit streams. By following this idea, a decoding complexity model of context-based adaptive binaryarithmetic coding (CABAC) for H.264/AVC is investigated in this research. Since different coding modes willhave an impact on the number of quantized transformed coefficients (QTCs) and motion vectors (MVs) and, con-sequently, the complexity of entropy decoding, the encoder with a complexity model can estimate the complexityof entropy decoding and choose the best coding mode to yield the best tradeoff between the rate, distortionand decoding complexity performance. The complexity model consists of two parts: one for source data (i.e.QTCs) and the other for header data (i.e. the macro-block (MB) type and MVs). Thus, the proposed CABAC

    decoding complexity model of a MB is a function of QTCs and associated MVs, which is verified experimentally.The proposed CABAC decoding complexity model can provide good estimation results for variant bit streams.Practical applications of this complexity model will also be discussed.

    1. INTRODUCTION

    H.264/AVC [1, 2] is an emerging video coding standard from ITU-T and ISO/IEC. It has been selected as a videocoding format in HD-DVD and Blue-ray specifications. H.264/AVC provides a large number of coding modes toimprove the coding gain, and its encoder may search all possible modes to decide the best one to minimize therate-distortion (RD) cost. Due to the use of a large set of coding modes, the complexity of H.264/AVC decodingis about 2.1 to 2.9 times of that of H.263 decoding [3]. For mobile applications, the coded video bit stream willbe decoded in portable devices, and it is critical to reduce the decoding complexity for power saving.

    One way to save power in H.264/AVC decoding is for the H.264/AVC encoder to generate decoder-friendlybit streams. That is, if the H.264/AVC encoder has a target decoding platform in mind, it can generate a bitstream that is easy to decode in that platform. This motivates us to study the decoding complexity model and itsapplications. If a decoding complexity model is given, the H.264/AVC encoder can use it to estimate the decodingcomplexity associated with various coding modes and select the best one to yield the best tradeoff among rate,distortion and decoding complexity. A decoder consists of several basic building modules, e.g. inverse DCT,inverse quantization, entropy decoding, motion compensation, etc., and we can examine the decoding complexitymodel for each module separately. It is desirable to consider computationally intensive modules first.

    We examine the decoder complexity model for context-based adaptive binary arithmetic coding (CABAC) inH.264/AVC in this work. This is an interesting and important problem for several reasons. First, it is observedthat entropy decoding demands a higher computational complexity in higher bit rate video (e.g. high definitionvideo) due to the existence of a larger number of non-zero quantized transformed coefficients (QTCs) andmotion vectors (MVs). Second, the entropy decoding module is the most computationally expensive one when

    H.264/AVC decoding is implemented in the graphic processor unit (GPU) [4] platform such as Nvidia GeForce7and ATI Radeon X1300 series. Since GPUs support hardware decoding in motion compensation and de-blockingfilter, entropy decoding becomes the main bottleneck in decoding complexity. Finally, H.264/AVC offers twoentropy coding tools; namely, the context-based adaptive variable length coding (CAVLC) and the context-basedadaptive binary arithmetic coding (CABAC). CABAC achieves better RD performance than CAVLC at the costof a higher decoding complexity [2]. Thus, we select the complexity model of CABAC decoding as our main taskhere.

  • 8/3/2019 Sze-Wei_SPIE2007

    2/12

    Decoder complexity models have been studied in the past. The can be classified into two categories: system-specific complexity models [512] and generic complexity models [13, 14]. The MPEG-4 system-specific videocomplexity verifier (VCV) model was described in [5], where the numbers of boundary macro-blocks (MBs) andnon-boundary MBs decoded per second are estimated and the decoding complexity is modeled as a functionof these two numbers. However, the decoding complexity of MBs coded by different modes in MPEG-4 canbe different so that the VCV model is not very accurate. To address this shortcoming, Valentim et al. [6,7]

    proposed an enhanced MPEG-4 decoding complexity model by considering the fact that MBs encoded with adifferent mode should have a different decoding complexity. They used the maximal decoding time to measurethe decoding complexity of MBs encoded by a different mode block-by-block, and the sum of each individualMBs complexity is the total decoding complexity of this bit stream.

    The H.264/AVC system-specific complexity models were studied in [810] and [11, 12]. The decoder com-plexity model for the H.264/AVC motion compensation process (MCP) was first investigated in [810], whereit is simply modeled as a function of the number of interpolation filters. Later, the decoder MCP complexitymodel was enhanced in [11, 12]. The enhanced model considers the number of interpolation filters as well as therelationship between MVs, which affects cache management efficiency and, thus, decoding complexity. Genericcomplexity models for variable length decoding (VLD) were described in [13, 14], where the sum of magnitudesof non-zero QTCs and the sum of run lengths of zero QTCs are estimated and the entropy decoding complexityis then modeled as a function of these two parameters.

    The complexity models mentioned above are however not suitable for H.264/AVC entropy decoding for severalreasons. First, existing system-specific models primarily target at estimating the MCP decoding complexity, anddo not work well for entropy decoding. Second, even though some complexity models can estimate the entropydecoding complexity for decoders using VLD, they are not accurate for H.264/AVC CABAC decoding, sinceCABAC is more complicated than VLD. Third, CABAC can be used to encode the whole syntax elementsin H.264/AVC, including QTCs, MVs, MB types and other flags. This flexibility makes its modeling morechallenging as compared with that in [13, 14]. To address these issues, a new complexity model for H.264/AVCCABAC decoding will be proposed in this work.

    The rest of this paper is organized as follows. A H.264/AVC CABAC decoding complexity model is presentedin Sec. 2. The application of this model and its integration with an H.264/AVC encoder are discussed in Sec. 3.The proposed model as well as its application are verified experimentally in Sec. 4. Finally, concluding remarksare given in Section 5.

    2. PROPOSED CABAC DECODING COMPLEXITY MODEL

    The proposed CABAC decoding complexity model is obtained by examining the CABAC decoding processcarefully. The CABAC and the QTC coding processes in H.264/AVC are first reviewed in Sec. 2.1. Then, thedecoding complexity model is proposed in Sec. 2.2.

    2.1. Review of context-based adaptive binary arithmetic coding (CABAC)

    In H.264/AVC, CABAC can be used to encode all syntax elements, including QTCs, MVs, reference frame indicesand other flags [15]. The CABAC encoding and decoding processes are shown in Fig. 1.

    AS shown in Fig. 1(a), the CABAC encoding process consists of three stages: binarization, context modelingand binary arithmetic coding (BAC). First, a non-binary syntax element is mapped into a binary sequence inthe binarization stage, where each symbol in the binary sequence is referred to as a bin. If the input is already

    a binary syntax element, it bypasses the binarization stage and directly goes into the context modeling stage.There are four basic binarization schemes in H.264/AVC: (1) the unary, (2) the truncated unary (TU), (3) thek-th order Exp-Golomb code (EGk), and (4) the fixed-length code. In addition to these four basic schemes, thefirst and third binarization schemes with a cut-off value S can be combined together, which is known as UEGk,to encode a non-binary syntax element such as motion vector differences (MVDs, which is the differences betweenMVs and predicted MVs) and the absolute values minus one for QTCs. To be more specific, the binary sequencegenerated by the UEGk binarization scheme consists of both prefix and suffix parts if the value of syntax element,C, is larger than cut-off value S. The prefix part is generated by the unary binarization scheme to represent the

  • 8/3/2019 Sze-Wei_SPIE2007

    3/12

    Figure 1. The CABAC encoding and decoding processes.

    value of S while the suffix part is generated by the EGk binarization scheme to represent the value of ( C S).

    On the other hand, ifC S, the resultant binary sequence only includes the prefix part generated by the unarybinarization scheme.

    After the binarization stage, each bin is fed into the context-modeling stage to decide the probability modelwhich will be used by the binary arithmetic coder (BAC) to encode this bin. Finally, BAC generates bit streamsand updates the context model. In H.264/AVC, bypass BAC is allowed so as to speed up the encoding speedwhen the distribution of coded bins is approximately uniform. For example, sign bits of QTCs and the suffixpart of a binary sequence generated by the UEGk binarization scheme are coded by the bypass BAC.

    As shown in Fig. 1(b), bit streams are fed into either the regular binary arithmetic decoding (BAD) or thebypass BAD in the CABAC decoding process, depending on the distribution of coded bins. Then, the inversebinarization process is utilized to reconstruct the non-binary syntax element. Finally, the context model isupdated.

    The process to encode QTCs in one 4x4 block consists of three stages as shown in Fig. 2. In the first stage,

    one bit variable code block flag is used to indicate whether there exists non-zero QTCs in this block. In the secondstage, which is called the significant map stage, the significant coeff flag (which is an one-bit array) is utilized toindicate the position of the non-zero QTC after mapping the 2D QTC array into an 1D array with the zig-zagscan order. If there is at least one non-zero QTC in the 1D array, a one bit variable last significant coeff flag isfurther used to indicate whether the current QTC is the last non-zero QTC in this block. Finally, in the thirdstage, which is also known as the level information stage, the absolute value minus one of this QTC and its signbit are coded. Please note that the binarization stage is skipped if syntax elements are binary. Therefore, onebit variables (i.e., code block flag, significant coeff flag and last significant coeff flag) are directly fed into thecontext-modeling stage and then coded by the regular BAC while the sign bits of QTCs are coded using thebypass BAC.

    In addition to the coding of QTCs, CABAC can also be used to encode other syntax elements. For example,one bit variables, mb skip flag and transform size flag, are used to indicate whether the current MB is skipped

    and the size of the spatial domain transform (i.e. 4x4 or 8x8 transforms), respectively. For the MB type, athree-bit variable is used to indicate whether the current MB is coded by P16x16, P16x8, P8x16 or P8x8 interprediction modes while a two-bit variable is needed to identify whether the current MB is coded by I4MB orI16MB intra types.

    MVs are first estimated using MV predictors, and then MVD is fed into the UEGk binarization scheme. Theprefix and the suffix parts of the resulting binary sequence are coded by the regular BAC and the bypass BAC,respectively. Similarly, the intra prediction direction is first estimated by the intra prediction predictor. Then,one bit variable is used to indicate whether the actual and estimate intra prediction directions are equal. If they

  • 8/3/2019 Sze-Wei_SPIE2007

    4/12

    Figure 2. Th CABAC encoding process for quantized transformed coefficients.

    are not the same, their difference is fed into a three-bit fixed-length binarization scheme and coded by the regularBAC. The reference frame index is binarized by the unary binarization scheme and then coded by the regularBAC. More detailed information of the CABAC encoding process for other syntax elements can be found in [15].

    2.2. CABAC Decoding Complexity Model

    The proposed CABAC decoding complexity model consists of two parts. One is used to model the decoding

    complexity for source data, i.e. QTCs, while another model is for header data, such as MVDs, reference frameindices, MB types and intra prediction types. It can be seen that the execution time of the CABAC decodingprocess in Fig. 1(b) depends on the number of loops, i.e. the number of BAD executions. Therefore, it isdesirable that the number of BAD executions is included in the CABAC decoding complexity model. Sincethe complexity of bypass BAD is cheap, our model only considers the number of regular BAD executions. Thenumber of regular BAD executions is an important parameter in our decoding complexity model.

    The CABAC decoding complexity for source data Csrc is modeled as a function of the number of regularBAD executions Nbad,1, the number of non-zero QTCs Nnz, the position of the last non-zero QTC Pnz and thenumber of non-skipped MBs Nmb. Mathematically, it can be written as

    Csrc = bad,1 Nbad,1 + nz Nnz + p Pnz + mb Nmb, (1)

    where bad, nz, p and mb are weights. Please note that the number of regular BAD executions is used to

    model the decoding complexity in the level information stage while the other three factors are used to measurethe decoding complexity of code block flag and that in the significant map stage.

    The CABAC decoding complexity model for header data Chdr is modeled as a function of the number ofregular BAD executions Nbad,2, the number of MVs Nmv, the number of reference frames Nref and the numberof skipped MBs Nskipped of the following form:

    Chdr = bad,2 Nbad,2 + mv Nmv + ref Nref + skipped Nskipped (2)

  • 8/3/2019 Sze-Wei_SPIE2007

    5/12

    where bad,2, mv, ref and skipped are weights. Being similar to the complexity model for the source data, thenumber of regular BAD executions is also included in the header data model to model the decoding complexitiesfor syntax elements such as the MB type, the MB subtype, the transform flag, the intra prediction type, andMVDs. The numbers of MVs and reference frames are used to model the complexity in decoding MVs andreference frame indices, respectively.

    The number of regular BAD executions in our model can be calculated easily. As mentioned before, a non-

    binary syntax element is fed into the binarization process to generate a binary sequence. The length of binarysequence determines the number of BAD executions. Since the binarization process is usually implemented bytable lookup or some additive and shift operations [16], the number of regular BAD executions can be obtainedonce a non-binary syntax element is given. For example, the UEG0 binarization scheme with cut-off value S = 14is used to generate binary sequences for the absolute values minus one of QTCs in the level information stage.The prefix part of the binary sequence is coded by the regular BAC while the suffix part is coded by the bypassBAC. Therefore, the number of regular BAD executions is equal to the length of the prefix part of the binarysequence. In other words, when the UEGk binarization scheme is adopted to generate binary sequences, thenumber of regular BAD executions is the minimum of the cut-off value and the value of the non-binary syntaxelement.

    Weights in (1) and (2) can be obtained as follows. First, several pre-encoded bit streams are selected andIntels Vtune performance analyzer is used to measure the number of clock ticks in CABAC decoding, which

    gives the measure of Csrc and Chdr. Second, the corresponding number of each contributing factor is countedseparately for those pre-encoded bit streams. Finally, a constrained least square method is used to find the bestfit of these weights. The proposed CABAC decoding complexity model will be verified experimentally in Section4.

    3. DECODER-FRIENDLY H.264/AVC SYSTEM DESIGN

    The application of the proposed decoding complexity model is discussed in this section. Consider the following twoscenarios. First, an H.264/AVC encoder generates a single bit stream for different decoding platforms without anydecoding complexity model. Second, the encoder generates several bit streams for different decoding platformsseparately according to their computational power so that the resultant bit stream is easy to decode at a particularplatform. For the latter case, the decoding complexity models should be incorporated in the H.264/AVC encoderso that the encoder can estimate the possible decoding complexity and then generate decoder-friendly bit streams.

    3.1. Rate-Distortion and Decoding Complexity Optimization

    In the conventional H.264/AVC encoder, there are two rate-distortion optimization (RDO) processes. The firstone decides the optimal inter prediction mode among the P8x8, P8x4, P4x8 and P4x4 modes for one 8x8 block.The second one determines the optimal inter or intra prediction mode for one 16x16 MB among the P Skip,P16x16, P16x8, P8x16, I16MB, I8MB and I4MB modes and four 8x8 blocks whose optimal inter predictionmodes have been decided by the first RDO process. Both RDO processes consist of the following steps. First,since different inter prediction modes have a different number of MVs, the RDO process performs the motionestimation (ME) to find the best MV if the current MB is to be coded by inter prediction modes. On the otherhand, the RDO process finds the best intra prediction direction if intra prediction modes are adopted for thisMB. Second, the RDO process performs actual encoding (e.g. the spatial domain transform, quantization andentropy encoding) and decoding tasks to get the reconstructed video frame so as to determine the associated bitrate and distortion. Then, the RDO process evaluates the RD cost function given by

    Jrd(blki|QP,m) = D(blki|QP,m) + m R(blki|QP,m), (3)

    where D(blki|QP,m) and R(blki|QP,m) are the distortion and the bit rate of block blki for a given coding modem and quantization parameter (QP), respectively. Finally, the RDO process finds the best mode that yields theminimal RD cost. The minimization of the RD cost function in (3) implies that the RDO process decides the bestmode that minimizes distortion D(blki|QP,m) while meeting the rate constraint; namely, R(blki|QP,m) Rst,i.Note that the Lagrangian multiplier, m, in (3) is used to control the bit rate. Thus, it depends on QP.

  • 8/3/2019 Sze-Wei_SPIE2007

    6/12

    The original RD optimization problem can be extended to consider the joint problem of RD and decodingcomplexity optimization (RDC). In other words, not only the rate constraint but also the decoding complexityconstraint are considered to minimize the distortion in the RDO process. We can introduce the decodingcomplexity cost into the original RD cost function (3) via

    Jrdc(blki|QP,m) = D(blki|QP,m) + m R(blki|QP,m) + c C(blki|QP,m), (4)

    where C(blki|QP,m) is the decoding complexity of block blki for given coding mode m and QP. Being similarto Lagrangian multiplier m for rate control, Lagrangian multiplier c is used to control the decoding complex-ity. The algorithm to select proper Lagrangian multiplier c for a given decoding complexity constraint (i.e.,C(blki|QP,m) Cst,i) will be discussed in Sec. 3.3.

    3.2. Relationship Between Bit Rates and Decoding Complexity

    Before addressing the problem of decoding complexity control, the relationship between bit rate and decodingcomplexity is studied. As mentioned before, the CABAC encoding process consists of three stages: binarization,context-modeling and binary arithmetic coding. The length of binary sequences generated in the binarizationstage determines the bit number of a non-binary syntax element c. The bit rate R can be expressed as R =L h2(c), where L and h2(c) are the length and the average bit rate of a binary sequence converted from anon-binary syntax element c, respectively.

    Next, consider the number of regular BAD executions which is an important parameter in our decodingcomplexity model. In H.264/AVC, the UEGK binarization scheme with cut-off value S is usually used togenerate binary sequences of non-binary syntax elements such as MVDs and the absolute values minus one ofQTCs. The number of regular BAD executions is equal to the length of the generated binary sequence if thevalue of non-binary syntax element is less than the cut-off value. Since the value of non-binary syntax elementis rarely larger than the cut-off value, the number of regular BAD executions, Nbad, is proportional to the bitnumber of non-binary syntax elements, i.e. R = L h2(c) = Nbad h2(c).

    The relationship between an H.264/AVC rate model and our CABAC decoding complexity model is studiedbelow. Here, we consider the H.264/AVC rate model in [17], which consist of the source bit part and the header bitpart. The source bit rate model is a function of the quantization step (QS), which is expressed as Rsrc =

    SATCQS ,

    where SATCis the sum of the absolute values of transformed coefficients (SATCs) for one 4x4 block. Since SATCQScan be viewed as the sum of absolute values of QTCs for a 4x4 block, the source bit rate model can be further

    written as Rsrc =

    i |QT Ci|, where QT Ci represents the ith QTC in one 4x4 block. Now, we consider theCABAC decoding complexity model for source data in (1). In the high bit rate case, the number of regular BADexecutions, which is used to model the decoding complexity in the level information stage, dominates the totaldecoding complexity while the other three terms used to model the decoding complexities for one-bit variablesin the significant map stage are less important. Since H.264/AVC adopts the UEGk binarization scheme withcut-off value S = 14 to generate binary sequences for QTCs, the number of regular BAD executions is equal to

    i min(|QT Ci|, 14) or

    i |QT Ci| in most cases. Thus, the CABAC decoding complexity for the source data isproportional to the source bit rate.

    The header bit rate model in [17] is a function of number of MVs Nmv, number of non-zero MV elementsNnzMV e and number of intra MBs Nintra, which is written as

    Rhdr = (NnzMV e + Nmv) + Nintra bintra, (5)

    where and are model parameters, and bintra is the average header bit number for intra MBs. As to theproposed CABAC decoding complexity model for the header data in (2), it includes the same term (i.e. thenumber of MVs) as the header bit rate model. In addition, the number of regular BAD executions is used tomodel the decoding complexities of MVDs and intra prediction types for inter and intra MBs, respectively. Asmentioned before, the number of regular BAD executions is proportional to the bit number of a non-binarysyntax element (i.e. MVD and intra prediction types in this case). Thus, the proposed CABAC complexitymodel for the header data is proportional to the number of header bits, too.

  • 8/3/2019 Sze-Wei_SPIE2007

    7/12

    Fig. 3 shows the relationships between the actual source/header bit rate and the CABAC decoding complexityfor source/header data for high bit rate video streams, respectively. The experimental results demonstratethat the source/header CABAC decoding complexity is proportional to the source/header bit rate. The linearrelationship between bit rates and CABAC decoding complexity will be used in the decoding complexity controlscheme as described in the next subsection.

    Figure 3. Relationship between bit rates and CABAC decoding complexity in high bit rate video streams.

    3.3. Decoding Complexity Control

    Decoding complexity control is a process to determine some control parameter such as the Lagrangian multiplier,c, in (4) so that the RDO process can decide the best coding mode that minimizes the distortion while satisfyingthe decoding complexity constraint. There exists some analogy between rate control and complexity control.

    Let Rc() be the rate-complexity (RC) mapping function, which can estimate the bit rate under a given

    decoding complexity. The RDC problem in (4) can be reduced to either the rate-distortion (RD) problemif Rst,i Rc(Cst,i) (which means that the rate constraint is tighter than the complexity constraint) or thecomplexity-distortion (CD) problem if Rc(Cst,i) < Rst,i (which means that the complexity constraint is tighterthan the rate constraint). Mathematically, the CD optimization problem can be written as

    min D(blki|QP,m) s.t. C(blki|QP,m) Cst,i min{D(blki|QP,m) + c C(blki|QP,m)} (6)

    or

    min D(blki|QP,m) s.t. Rc(C(blki|QP,m)) Rc(Cst,i) min{D(blki|QP,m) +

    m Rc(C(blki|QP,m)}. (7)

    We want to solve the CD optimization problem in (7) rather than in (6) since Lagrangian multiplier m in (7)is easier to obtain.

    Lagrangian multiplier

    m can be determined by the following steps. First, QS can be computed by thelinear rate model [17] once the estimated rate Rc(C(Bi|m,QP)) for given CABAC decoding complexity isobtained. Then, QS can be used to determine QP. Finally, Lagrangian multiplier m can be obtained byformula m = 0.85 2

    (QP12)/3 as suggested in [18].

    The above steps indicate that the CD optimization problem can be further converted into the RD optimizationproblem, and the new RD optimization problem has a tighter rate constraint than the original rate constraint. Inother words, it is desirable to lower the rate so as to reduce the CABAC decoding complexity when the decoding

  • 8/3/2019 Sze-Wei_SPIE2007

    8/12

    Figure 4. (a) The rate control algorithm and (b) the proposed joint rate and decoding complexity control algorithm inH.264/AVC.

    complexity constraint is tighter than the rate constraint. The proposed CABAC decoding complexity controlscheme is incorporated with the H.264/AVC rate control algorithm [19] as described below.

    As shown in Fig. 4(a), the rate control algorithm in H.264/AVC consists of several stages. First, the framelayer rate control decides the frame bit number T, which depends on picture type, i.e., P or B types, and the

    buffer status in a hypothetical reference decoder, which is needed to avoid buffer underflow or overflow in actualdecoders. Next, the bit number, Bi, of the current basic unit is determined by Bi = T

    MAD2i

    kMAD2

    k

    , where M ADi

    is the predicted mean absolute difference of the current basic unit. In other words, more bits are assigned to thebasic unit if its distortion is higher. Then, the source bit number, Bsrc, of the current basic unit is obtained bysubtracting the estimated header bit number, Bhdr, from the basic unit bit number Bi. After that, the linearrate model is used to determine QS by the source bit number Bsrc, and then QS is used to decide QP. Theresultant QP of the basic unit is clipped via QP = min(QPpreviousframe + 3, QP) for quality smoothness, and itwill be used for mode decision and further encoding process. Finally, the statistical information of the rate andthe distortion models are updated according to the coding results of the current frame and the basic unit. Pleasenote that rate control is not performed for the first I, P and B frames since the rate and distortion models haveno statistical information. The basic unit can be one MB or several MBs, which is one of the encoding optionsin the H.264/AVC encoder.

    The proposed joint H.264/AVC rate and decoding complexity control algorithm is shown in Fig.4(b). Thereare two rate-complexity (RC) mapping functions in the algorithm: one for the source data and the other for theheader. They are in form of

    Rc,src(Csrc(BUi|QP,m)) = src Csrc(BUi|QP,m),Rc,hdr (Chdr(BUi|QP,m)) = hdr Chdr(BUi|QP,m),

    (8)

    where Csrc(BUi|QP,m) and Chdr(BUi|QP,m) are the CABAC decoding complexities of basic unit BUi forsource data and header data, respectively, for given coding mode m and QP, and src and hdr are RC function

  • 8/3/2019 Sze-Wei_SPIE2007

    9/12

    coefficients. These coefficients can be trained in the RDO process. The RDO process performs the encodingtask to get the source and the header rates, and estimates the CABAC decoding complexities for the source andthe header data using the proposed complexity model. Then, bit rates and decoding complexities for recent 100MBs are used to train coefficients of the RC mapping function with the method of least squares.

    The proposed complexity control algorithm consists of the following steps. First, the estimated header rateBhdr is used to estimate the CABAC decoding complexity for header data Chdr by dividing Bhdr by hdr. Then,

    the CABAC decoding complexity for the source data, Csrc, is obtained by subtracting Chdr from the CABACdecoding complexity of the current basic unit, Cb, by

    Cb =Cst Csum

    Nnb, (9)

    where Cst, Csum and Nnb are the CABAC decoding complexity constraint, the sum of allocated CABAC decodingcomplexity, and the number of non-coded basic units, respectively. After that, the source rate Bsrc,c for a givenCABAC decoding complexity constraint is determined by the source RC mapping function and the CABACdecoding complexity for source data Csrc. Finally, the minimal source rate between Bsrc,c and Bsrc is used todecide QS and QP for further encoding processes.

    4. EXPERIMENTAL RESULTS

    We conducted experiments to verify the proposed CABAC decoding complexity model and the decoding com-plexity control scheme on the PC platform. The CPU was Pentium mobile 1.7 GHz CPU with 512 Mb RAMand the operating system was Windows XP. The reference JM9.4 decoder was optimized by the Intel MMXtechnology. We selected Foreman and Mobile CIF sequences as training sequences and pre-encoded 40 trainingbit streams. Each bit stream file consisted of 270 frames and was encoded by different QPs, i.e., QP=2, 4, 6, 40. The Intel Vtune performance analyzer 8.0 was used to measure the CABAC decoding complexities of thesource and the header data for all pre-encoded bit streams. The numbers of clock-ticks measured by Intel VTunewere divided by 1.7 107 to get the decoding time of the source data and the header data in milli-seconds. Then,the proposed CABAC decoding complexity model counted the number of all decoding complexity coefficients forthose pre-encoded bit streams. Finally, the information was used to train the weights of the decoding complexitymodel for the source data, i.e., bad,1, nz, p and mb, and those for the header data, i.e., bad,2, mv, refand skipped. The constrained least square method was used to determine all weight coefficients. For the sourcedata model, we have

    bad = 1.63 105, nz = 1.83 10

    5.

    p = 1.01 105, mb = 2.872 10

    5.(10)

    For the header data model, we obtain

    bad,2 = 2.34 105, mv = 5.83 105.ref = 3.278 10

    4, skipped = 1.661 104.

    (11)

    Next, these weights were used in the proposed decoding complexity model to estimate the decoding complexitiesof four HD (1920x1080) bit streams: Blue sky, Toy and calendar, Sunflower, and Rush hour.

    Performance comparison between the estimated decoding complexity with the proposed complexity modeland the actual decoding complexity measured by the Intel Vtune for four test video sequences is shown in Table1. We see that the proposed complexity model provides good estimation results for these test sequences. The

    errors are within 7%.Experimental results using the H.264/AVC encoder with the CABAC decoding complexity model and the

    decoding complexity control scheme are shown in Fig. 5, where results for Blue sky, Toy and calendar, Sunflower,and Rush hour are shown in rows 1, 2, 3 and 4, respectively. The x-axis is the decoding time and the y-axis isthe deviation in complexity control (column 1), the complexity saving (column 2) and the coding performance(column 3). The first point in the x-axis corresponds to the case without decoding complexity control. Ascompared with sequences without decoding complexity control, the Toy and calendar sequence with the targetCABAC complexity at 700 ms loses 0.57 dB in PSNR but saves 28.25% in decoding complexity. The Sunflower

  • 8/3/2019 Sze-Wei_SPIE2007

    10/12

    Table 1. Comparison between the actual and estimated CABAC decoding complexities for the source and the headerdata for Blue Sky, Toy and calendar, Sunflower, and Rush hour video sequences.

    Blue Actual (Src) Est. (Src) Error(%) Actual (Hdr) Est. (Hdr) Error(%)

    29.40M 685.31 717.16 4.65 180.04 188.06 4.45

    25.60M 597.59 620.16 3.78 173.27 178.35 2.9320.48M 476.13 492.6 3.46 162.00 164.11 1.30

    15.36M 358.31 359.6 0.36 142.19 148.49 4.43

    10.24M 241.34 230.23 4.60 124.19 129.28 4.10

    Toy Actual (Src) Est. (Src) Error(%) Actual (Hdr) Est. (Hdr) Error(%)

    29.40M 679.40 700.19 3.06 272.33 276.23 1.43

    25.60M 572.36 599.68 4.77 260.74 260.19 0.21

    20.48M 458.26 464.42 1.34 236.84 239.15 0.98

    15.36M 324.23 337.34 4.04 210.70 212.42 0.82

    10.24M 222.22 211.91 4.64 169.59 173.55 2.33

    Sunflower Actual (Src) Est. (Src) Error(%) Actual (Hdr) Est. (Hdr) Error(%)29.40M 640.93 651.6 1.66 286.14 290.36 1.47

    25.60M 560.79 557.2 0.64 268.96 277.16 3.05

    20.48M 429.40 428.42 0.23 247.26 256.62 3.78

    15.36M 316.13 309 2.26 220.96 227.71 3.06

    10.24M 203.74 198.4 2.62 177.83 182.55 2.65

    Rush Actual (Src) Est. (Src) Error(%) Actual (Hdr) Est. (Hdr) Error(%)

    29.40M 683.28 728.3 6.59 207.75 219.77 5.79

    25.60M 583.13 617.37 5.87 208.33 214.68 3.05

    20.48M 436.53 463.35 6.14 204.45 204.47 0.01

    15.36M 301.06 309.62 2.84 188.04 188.95 0.4810.24M 183.02 184.28 0.69 156.38 159.91 2.25

    sequence with the target CABAC complexity at 700 ms loses 0.54 dB in PSNR but saves 22.62% in decodingcomplexity. Finally, the Rush hour sequence with the target CABAC complexity at 750 ms loses 0.46 dB inPSNR but saves 21.32% in decoding complexity.

    The above results clearly demonstrate that the H.264/AVC encoder with the proposed CABAC decodingcomplexity model and the decoding complexity control scheme can generate bit streams to meet different decodingcomplexity constraints. Deviations between actual and target decoding complexities are all less than 10%.Besides, a significant amount of decoding complexity can be saved at the cost of some PSNR loss. This is usefulin a mobile broadcasting environment, where multiple mobile devices can get broadcast/streaming video in realtime with lower power consumption.

    5. CONCLUSION

    The CABAC decoding complexity model and its application to H.264/AVC encoding were presented in thiswork. An encoder with the proposed complexity model and complexity control scheme can generate a bit streamthat is suitable for a receive platform with a severe power constraint. The coded bit stream can balance thetradeoff between the RD requirement as well as the computational power of the decoding platform. The proposed

  • 8/3/2019 Sze-Wei_SPIE2007

    11/12

    Figure 5. CABAC decoding complexity control for four test sequences: Blue sky (row 1), Toy and calendar (row 2),Sunflower (row 3), and Rush hour (row 4), where the x-axis is the decoding time and the y-axis is the deviation incomplexity control (column 1), the complexity saving (column 2) and the coding performance (column 3).

  • 8/3/2019 Sze-Wei_SPIE2007

    12/12

    decoding complexity model consists of two parts: one for the source data and the other for the header data. Thedecoding complexity model was verified experimentally. It was shown that the model provides good estimationresults for several test sequences. The coding and complexity performance of the H.264/AVC codec with theproposed decoding complexity model and the decoding complexity control scheme was demonstrated. It wasshown that the H.264/AVC encoder can generate bit streams to meet different decoding complexity constraintsaccurately and the resultant bit streams can be decoded at a much lower complexity at the cost of small PSNR

    loss.

    REFERENCES

    1. T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra, Overview of the H.264/AVC coding standard,IEEE Trans. on Circuits and Systems for Video Technology 7, pp. 560576, July 2003.

    2. J. Ostermann, J. Bormans, P. List, D. Marpe, M. Narroschke, F. Pereira, T. Stockhammer, and T. Wedi,Video coding with H.264/AVC: Tools, performance, and complexity, IEEE Circuits and Systems Magazine4, pp. 728, 2004.

    3. M. Horowitz, A. Joch, F. Kossentini, and A. Hallapuro, H.264/AVC baseline profile decoder complexityanalysis, IEEE Trans. on Circuits and Systems for Video Technology 7, pp. 704716, July 2003.

    4. G. Shen, G. P. Gao, S. Li, H. Y. Shum, and Y. Q. Zhang, Accelerate video decoding with generic GPU,IEEE Trans. on Circuits and Systems for Video Technology 5, pp. 685693, May 2005.

    5. Information technology - coding of audiovisual objects - Part 2: Visual, Dec 1999.6. J. Valentim, P. Nunes, and F. Pereia, An alternative complexity model for the MPEG-4 video verifier

    mechanism, in IEEE Int. Conf. on Image Processing (ICIP2001), pp. 461464, Oct. 2001.

    7. J. Valentim, P. Nunes, and F. Pereia, Evaluating MPEG-4 video decoding complexity for an alternativevideo complexity verifier model, IEEE Trans. on Circuits and Systems for Video Technology 12, pp. 10341044, Nov. 2002.

    8. Y. Wang and S. F. Chang, Complexity adaptive H.264 encoding for light weight stream, in IEEE Int.Conf. Acoustics, Speech and Signal Processing (ICASSP), pp. II2528, May 2006.

    9. Y. Wang, Low-complexity H.264 decoder: motion estimation and mode decision, in [Online] Available:http://www.ee.columbia.edu/ ywang/Research/camed.html,

    10. Y. Wang, Resource constrained video coding/adaptation, in PhD thesis graduate school of arts and sci-ences, Columbia Unviversity,

    11. S. W. Lee and C.-C. J. Kuo, Complexity modeling for motion compensation in H.264/AVC decoder, inIEEE Int. Conf. on Image Processing (ICIP), Aug 2007.

    12. S. W. Lee and C.-C. J. Kuo, Motion compensation complexity model for decoder-friendly H.264 systemdesign, in IEEE Int. Workshop on Multimedia Signal Processing (MMSP2007), Oct. 2007.

    13. M. van der Schaar and Y. Andreopoulos, Rate-distortion-complexity modeling for network and receiveraware adaptation, IEEE Trans. on Multimedia 7, pp. 471479, June 2005.

    14. Y. Andreopoulos and M. van der Schaar, Complexity-constrained video bitstream shaping, IEEE Trans.on Signal Processing 55, pp. 19671974, May 2007.

    15. D. Marpe, H. Schwarz, and T. Wiegand, Context-based adaptive binary arithmetic coding in theH.264/AVC video compression standard, IEEE Trans. on Circuits and Systems for Video Technology 13,pp. 620636, July 2003.

    16. J. L. Chen, Y. K. Lin, and T. S. Chang, A low cost context adaptive arithmetic coder for H.264/MPEG-4AVC video coding, in IEEE Int. Conf. on Acoustics, Speech and Signal Processing, May 2007.

    17. D. K. Kwon, M. Y. Shen, and C. C. J. Kuo, Rate control for H.264 video with enhanced rate and distortionmodels, IEEE Trans. on Circuits and Systems for Video Technology 17, pp. 517529, May 2007.

    18. T. Wiegand, H. Schwarz, A. Joch, F. Kossentini, and G. J. Sullivan, Rate-constrained coder control andcomparison of video coding standards, IEEE Trans. on Circuits and Systems for Video Technology 13,pp. 688703, July 2003.

    19. Z. G. Li, F. Pan, K. P. Lim, X. Lin, and S. Rahardja, Adaptive rate control for H.264, in IEEE Int. Conf.on Image Processing (ICIP2004), pp. 745748, Oct. 2004.