15. 15. h.264/avch.264/avccwlin/courses/... · 16x16, 8x8, 4x4, 16x8, 8x16, 8x4, 4x8) – integer,...

44
1 Page 1 15. 15. H.264/AVC H.264/AVC Prof. Chia Prof. Chia-Wen Lin ( Wen Lin (林嘉文 林嘉文) Department of Department of Electrical Engineering Electrical Engineering National National Tsing Tsing Hua Hua University University 03 03-5731152 5731152 [email protected] [email protected] MPEG MPEG-4 Parts 4 Parts Part I: Part I: Systems Systems Part II: Part II: Visual Visual Part III: Part III: Audio Audio Part IV: Part IV: Conformance Conformance Part V: Part V: Reference software Reference software Part VI: Part VI: DMIF (Delivery Multimedia Integration Framework) DMIF (Delivery Multimedia Integration Framework) Part VII: Part VII: Optimized software for MPEG Optimized software for MPEG-4 tools 4 tools Part VIII: Part VIII: MPEG MPEG-4 on IP framework 4 on IP framework Part IX: Part IX: Reference hardware description Reference hardware description Part X: Part X: Advanced Video Coding (AVC) Advanced Video Coding (AVC)

Upload: others

Post on 16-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

  • 1

    Page 1

    15. 15. H.264/AVCH.264/AVC

    Prof. ChiaProf. Chia--Wen Lin (Wen Lin (林嘉文林嘉文))o C ao C a e (e (林嘉文林嘉文))Department of Department of Electrical Engineering Electrical Engineering

    National National TsingTsing HuaHua UniversityUniversity0303--57311525731152

    [email protected]@ee.nthu.edu.tw

    MPEGMPEG--4 Parts4 Parts

    Part I: Part I: SystemsSystemsPart II: Part II: VisualVisualPart III: Part III: AudioAudioPart IV: Part IV: ConformanceConformancePart V: Part V: Reference softwareReference softwarePart VI: Part VI: DMIF (Delivery Multimedia Integration Framework)DMIF (Delivery Multimedia Integration Framework)Part VII: Part VII: Optimized software for MPEGOptimized software for MPEG--4 tools4 toolsPart VIII: Part VIII: MPEGMPEG--4 on IP framework4 on IP frameworkPart IX: Part IX: Reference hardware descriptionReference hardware descriptionPart X: Part X: Advanced Video Coding (AVC)Advanced Video Coding (AVC)

  • 2

    Page 2

    MPEGMPEG--4 Parts4 Parts

    Visual Visual –– Part 2 (ISO/IEC 14496Part 2 (ISO/IEC 14496--2)2)–– VideoVideo

    C di f t l idC di f t l id•• Coding of natural videoCoding of natural video–– SNHC (SyntheticSNHC (Synthetic--Natural Hybrid Coding)Natural Hybrid Coding)

    •• Facial & Body animationFacial & Body animation•• Graphic codingGraphic coding

    –– Texture codingTexture coding–– Sprite codingSprite coding

    Vi lVi l P t 10 (ISO/IEC 14496P t 10 (ISO/IEC 14496 10)10)Visual Visual –– Part 10 (ISO/IEC 14496Part 10 (ISO/IEC 14496--10)10)–– AVC (Advanced Video Coding)AVC (Advanced Video Coding)–– JVT (Joint Video Team), ISO+ITUJVT (Joint Video Team), ISO+ITU--TT–– Focused solely on coding of natural videoFocused solely on coding of natural video–– Very high coding efficiencyVery high coding efficiency

    MPEGMPEG--4 AVC4 AVC

    Working Draft 2 Working Draft 2 -- January 2002January 2002Committee Draft (CD) Committee Draft (CD) –– May 2002May 2002Final CD Final CD –– July 2002July 2002FDIS (Final Draft International Standard) FDIS (Final Draft International Standard) ––December 2002December 2002

  • 3

    Page 3

    Video Coding StandardsVideo Coding Standards

    MPEGMPEG--22–– State of the art 1994State of the art 1994–– State of the art, 1994State of the art, 1994

    MPEGMPEG--4 Video, Part 24 Video, Part 2–– ASP (Advanced Simple Profile)ASP (Advanced Simple Profile)–– State of the art, 1999State of the art, 1999–– ~ 1.5 coding gain over MPEG~ 1.5 coding gain over MPEG--2 (on average)2 (on average)

    MPEGMPEG 4 AVC P t 104 AVC P t 10MPEGMPEG--4 AVC, Part 104 AVC, Part 10–– State of the art, 2002State of the art, 2002–– ~ 2x coding gain over MPEG~ 2x coding gain over MPEG--2 (on average)2 (on average)–– Final Draft Standard in Dec 2002Final Draft Standard in Dec 2002

    The Design Goals of MPEGThe Design Goals of MPEG--4 AVC4 AVC

    •• High compression efficiencyHigh compression efficiency•• Flexible application to delay constraintsFlexible application to delay constraintsFlexible application to delay constraints Flexible application to delay constraints

    appropriate to a variety of servicesappropriate to a variety of services•• Error resilience capabilityError resilience capability•• Complexity scalabilityComplexity scalability•• Full specification of decoding (no mismatch)Full specification of decoding (no mismatch)•• High quality applicationHigh quality application•• Network friendlinessNetwork friendliness

  • 4

    Page 4

    ApplicationsApplications

    •• Conversational services for video telephony and Conversational services for video telephony and video conferencingvideo conferencing

    •• Live or preLive or pre--coded video streaming servicescoded video streaming services•• Video in multimedia messaging services (MMS) Video in multimedia messaging services (MMS)

    Video Coding HierarchyVideo Coding Hierarchy

    •• Sequence, Sequence, consisting ofconsisting ofPi tPi t NAL•• Pictures, Pictures, consisting ofconsisting of

    •• Slices, Slices, consisting ofconsisting of•• Macroblocks, Macroblocks, consisting ofconsisting of•• Blocks, Blocks, consisting ofconsisting of

    Pixels / PelsPixels / Pels

    NAL

    VCL

    •• Pixels / PelsPixels / Pels

    Note: for interlaced video, a picture consists of either one frame or two fields

  • 5

    Page 5

    VCL and NALVCL and NAL

    •• H.264 consists of H.264 consists of –– Video Coding Layer (VCL) Video Coding Layer (VCL) ––

    •• Perform the tasks associated with video codingPerform the tasks associated with video coding–– Network Abstraction Layer (NAL) Network Abstraction Layer (NAL) ––

    •• Implement videoImplement video--specific support features for a specific support features for a variety of networksvariety of networks

    •• Seamless and easy integration into all current Seamless and easy integration into all current transmission protocoltransmission protocoltransmission protocoltransmission protocol

    •• Easier packetization and better information priority Easier packetization and better information priority controlcontrol

    The Features of VCL (1/3)The Features of VCL (1/3)

    •• TransformationTransformation–– Integer 4x4 block transform for residual codingInteger 4x4 block transform for residual coding–– HardamardHardamard

    •• A 4x4 transform on the DC coefficients of the 4x4 A 4x4 transform on the DC coefficients of the 4x4 blocks in a 16x16 macroblockblocks in a 16x16 macroblock

    •• A 2x2 transform for the DC coefficients of the 4x4 A 2x2 transform for the DC coefficients of the 4x4 chroma blocks in a 8x8 macroblockchroma blocks in a 8x8 macroblock

  • 6

    Page 6

    The Features of VCL (2/3)The Features of VCL (2/3)

    •• QuantizationQuantization•• Motion EstimationMotion EstimationMotion EstimationMotion Estimation

    –– Variable blockVariable block--size motion prediction (7 block sizes: size motion prediction (7 block sizes: 16x16, 8x8, 4x4, 16x8, 8x16, 8x4, 4x8)16x16, 8x8, 4x4, 16x8, 8x16, 8x4, 4x8)

    –– Integer, 1/2Integer, 1/2--, and 1/4, and 1/4--pixel motion vector accuracypixel motion vector accuracy–– Multiple reference frames (max. 15) may be used for Multiple reference frames (max. 15) may be used for

    predictionprediction

    The Features of VCL (3/3)The Features of VCL (3/3)

    •• Entropy coding:Entropy coding:–– ContextContext--based Adaptive Variable Length Coding based Adaptive Variable Length Coding

    (CAVLC)(CAVLC)(CAVLC)(CAVLC)–– ContextContext--based Adaptive Binary Arithmetic Coding based Adaptive Binary Arithmetic Coding

    (CABAC)(CABAC)

    •• Others:Others:–– SpaceSpace--domain Intra prediction (10 prediction modes)domain Intra prediction (10 prediction modes)–– DeDe--blocking loop filterblocking loop filterDeDe blocking loop filterblocking loop filter–– Motion vector predictionMotion vector prediction–– Slice structureSlice structure–– Interlace coding toolsInterlace coding tools

  • 7

    Page 7

    Frame TypesFrame Types

    •• II--frameframe•• PP--frameframePP frameframe•• BB--frameframe•• SPSP-- and SIand SI--frameframe

    –– SP and SI frames provide functionalities for bitSP and SI frames provide functionalities for bit--stream switching, splicing, random access, VCR stream switching, splicing, random access, VCR functionalities, and error resilience/recoveryfunctionalities, and error resilience/recoveryfunctionalities, and error resilience/recoveryfunctionalities, and error resilience/recovery

    Picture FormatsPicture Formats

    •• Color sequences using 4:2:0 chroma subColor sequences using 4:2:0 chroma sub--samplingsampling

    ...TopField

    BottomField

    TopField

    interlaced framesprogressive frames

    ...

    = Location of luminance sample= Location of chrominance sample

    Guide:Time

    = Luminance Sample

    = Chrominance Sample

  • 8

    Page 8

    Macroblock SubdivisionMacroblock Subdivision

    •• Each Picture is divided into 16x16 macroblocks.Each Picture is divided into 16x16 macroblocks.•• The order of the macroblocks in the bitstream depends The order of the macroblocks in the bitstream depends

    on the Macroblock Allocation Map and is noton the Macroblock Allocation Map and is noton the Macroblock Allocation Map and is not on the Macroblock Allocation Map and is not necessarily raster scan ordernecessarily raster scan order

    0 1 2 3 4 5 6 0 1 2 3 40 1 2 3 4 5 6

    7 8 9

    0 1 2 3 4

    5 6 7 8 9

    MPEGMPEG--4 AVC/H.264: Encoder 4 AVC/H.264: Encoder ArchitectureArchitecture

    ControlData

    CoderControl

    T f /

    InputVideoSignal

    EntropyCoding

    Scaling & Inv. Transform

    Quant.Transf. coeffs

    Decoder

    Transform/Scal./Quant.-

    Split intoMacroblocks16x16 pixels

    Intra-frame Prediction

    De-blockingFilter

    Motion-Compensation

    MotionData

    Intra/Inter

    MotionEstimation

    PredictionOutputVideoSignal

  • 9

    Page 9

    MPEGMPEG--4 AVC/H.264: Motion 4 AVC/H.264: Motion CompensationCompensation

    ControlData

    Q t

    CoderControl

    Transform/

    InputVideoSignal

    EntropyCoding

    Scaling & Inv. Transform

    Quant.Transf. coeffs

    Decoder

    Scal./Quant.-Split into

    Macroblocks16x16 pixels

    Intra-frame Prediction

    De-blockingFilter

    Output0

    16x16

    0 1

    8x16MB

    Types

    8x80 12 3

    16x8

    1

    0

    Motion-Compensation

    MotionData

    Intra/Inter

    MotionEstimation

    OutputVideoSignal

    Motion vector accuracy 1/4 (6-tap filter)

    8x8

    0

    4x8

    0 10 12 3

    4x48x4

    108x8

    Types

    2 31

    Variable Variable BlockBlock--Size CodingSize Coding

  • 10

    Page 10

    Motion CompensationMotion Compensation

    •• Various block sizes and shapes for motion Various block sizes and shapes for motion compensation compensation

    •• 1/4 sample accuracy (sort of per MPEG1/4 sample accuracy (sort of per MPEG--4, Pt. 2 V.2)4, Pt. 2 V.2)–– 6 tap filtering to 1/2 sample accuracy6 tap filtering to 1/2 sample accuracy–– simplified filtering to 1/4 sample accuracysimplified filtering to 1/4 sample accuracy–– special position with heavier filteringspecial position with heavier filtering

    •• Multiple reference pictures (per H.263++ Annex U)Multiple reference pictures (per H.263++ Annex U)•• TemporallyTemporally--reversed motion and generalized Breversed motion and generalized B--

    framesframes•• BB--frame prediction weightingframe prediction weighting

    Block Modes of P PicturesBlock Modes of P Pictures

    •• MacroblockMacroblock: 16x16: 16x16•• 7 motion prediction modes7 motion prediction modes

    –– 16x16, 16x8, 8x16, 8x8, 8x4, 4x8, 4x416x16, 16x8, 8x16, 8x8, 8x4, 4x8, 4x4–– Motion vectors accuracy: integer, ½Motion vectors accuracy: integer, ½--, and ¼, and ¼--pixelpixel

    0 0 1

    1

    0 0 1

    2 3

    Mode 1 Mode 2 Mode 3 Mode 4

    0 1 2 3

    4 5 6 7

    0 12 34 56 7

    0 1 2 34 5 6 78 9 10 11

    12 13 14 15

    Mode 5 Mode 6 Mode 7

  • 11

    Page 11

    Motion Vector SearchMotion Vector Search

    •• Motion EstimationMotion Estimation–– Integer pixel searchInteger pixel search

    F ti l i l h (1/2F ti l i l h (1/2 d 1/4d 1/4 i l)i l)–– Fractional pixel search (1/2Fractional pixel search (1/2-- and 1/4and 1/4--pixel)pixel)–– Reference frames selection from multiple reference Reference frames selection from multiple reference

    frames (max. 15 frames)frames (max. 15 frames)–– Search range: Search range:

    •• horizontal [horizontal [--2048, 2047.75] (max) 2048, 2047.75] (max) •• vertical [vertical [--512, 511.75] (max) 512, 511.75] (max) e t ca [e t ca [ 5 , 5 5] ( a )5 , 5 5] ( a )

    Motion EstimationMotion Estimation

    •• Motion vector predictionMotion vector prediction–– In same sliceIn same slice–– Median prediction (except 16x8 and 8x16 blocks)Median prediction (except 16x8 and 8x16 blocks)p ( p )p ( p )

    A, B, C, D and E may come from different reference pictures

    V1 = median{VA,VB,VC,VD }

    1. C is not available, VC = VD2. B,C, and D are not available, VB = VD = VD = VA3. Any predictor is not either of above two rules, its MV is 0

  • 12

    Page 12

    Motion EstimationMotion Estimation

    •• Motion vector prediction for 16x8 and 8x16 blocksMotion vector prediction for 16x8 and 8x16 blocks–– Directional segmentation predictionDirectional segmentation prediction

    8x16 16x8

    Motion EstimationMotion Estimation

    •• Integer pixel searchInteger pixel search–– search positions are organised in a “search positions are organised in a “spiralspiral”” structure structure

    around the predicted vectoraround the predicted vectoraround the predicted vectoraround the predicted vector

    . . . . . .. 15 9 11 13 16. 17 3 1 4 18. 19 5 0 6 20. 21 7 2 8 228. 23 10 12 14 24

  • 13

    Page 13

    Motion EstimationMotion Estimation

    •• full fractionalfull fractional--ppixixel searchel search

    V1D1 D2

    ppixixel searchel search((½½-- and ¼and ¼--pixel)pixel)

    a b c

    d e

    f g h

    I II III

    IV V

    VI VII VIII

    CH1 H2

    V2D3 D4

    Capital letters (C,H1,H2…) : integer pixel positionsRoma numbers (I,II,III...): 1/2-pel positionsLower case letters(a,b,c...):1/4-pel positions

    Motion EstimationMotion Estimation

    •• Fractional pixel searchFractional pixel search–– Check the eight 1/2Check the eight 1/2--pel candidates, I ~ VIII around pel candidates, I ~ VIII around

    the best integerthe best integer pelpel C;C; decide the best 1/2decide the best 1/2 pelpel VVthe best integerthe best integer--pel pel C;C; decide the best 1/2decide the best 1/2--pel pel VVsubject to the minimal cost among the 1/2 subject to the minimal cost among the 1/2 --pel pel candidatescandidates

    –– Check the eight 1/4Check the eight 1/4--pel candidates, a ~ h around pel candidates, a ~ h around the best 1/2the best 1/2--pel pel V, V, decide the best 1/4decide the best 1/4--pel pel hh subject subject to the minimal cost among the 1/4to the minimal cost among the 1/4--pel candidatespel candidates

    –– Select the motion vector and blockSelect the motion vector and block--size pattern,size pattern,Select the motion vector and blockSelect the motion vector and block size pattern, size pattern, which produces the lowest costwhich produces the lowest cost

  • 14

    Page 14

    Fractional Pel Value Interpolation: Fractional Pel Value Interpolation: LumaLuma

    •• Calculate HalfCalculate Half--PelPel valuesvalues–– use 6use 6--tap filter {1, tap filter {1, --5, 20, 20, 5, 20, 20, --

    5, 1} to get b5, 1} to get b5, 1} to get b5, 1} to get b–– bbhh= clip(((b+16)>>5))= clip(((b+16)>>5))–– c from b values using the 6 tap c from b values using the 6 tap

    filterfilter–– ccmm= clip(((c+512)>>10))= clip(((c+512)>>10))

    •• Average of integer and halfAverage of integer and half--pelpell t fi dl t fi d d fd fvalues to find values to find d,e,f,gd,e,f,g

    –– e.g. d = (e.g. d = (A+bA+bhh)>>1)>>1•• h = h = ((bbhh++bbvv)>>)>>1 (diagonal direction 1 (diagonal direction

    averaging)averaging)•• ii = (A1+A2+A3+A4+2)>>2= (A1+A2+A3+A4+2)>>2

    Fractional Pel Value Interpolation: Fractional Pel Value Interpolation: ChromaChroma

    •• dydy are the fractional position in are the fractional position in units of one eighth samplesunits of one eighth samples

    •• A, B, C, and D are integer pixelsA, B, C, and D are integer pixels A B

    dxdy

    8-dx

    8-dy

    C D

    22 /)2/)(8)())((( 88DddCddBd8dAd8d8v yxyxyxyx ++−+−+−−=

  • 15

    Page 15

    BB--PicturesPictures

    •• Advantages:Advantages:–– Improve coding efficiencyImprove coding efficiency–– Provide temporal scalabilityProvide temporal scalability

    •• 5 modes:5 modes:–– Direct Mode: derived forward and backward MVs, none transmittedDirect Mode: derived forward and backward MVs, none transmitted–– Forward Mode: prediction from a previous reference frameForward Mode: prediction from a previous reference frame–– Backward Mode: prediction from a subsequent reference frameBackward Mode: prediction from a subsequent reference frame–– BiBi--directional Mode: separate forward and backward MVsdirectional Mode: separate forward and backward MVs–– Intra Prediction ModeIntra Prediction Mode

    •• MVs in Direct Mode:MVs in Direct Mode: P/I PBMVs in Direct Mode:MVs in Direct Mode:–– MVF = (TRB * MV)/TRDMVF = (TRB * MV)/TRD–– MVB = (TRB MVB = (TRB -- TRD) * MV/TRDTRD) * MV/TRD MV

    MVF

    MVB

    TimeTRD

    TRB

    BB--PicturesPictures•• Direct ModeDirect Mode

    –– No MV data is transmittedNo MV data is transmittedSame block structure as coSame block structure as co--located MB in temporallylocated MB in temporally–– Same block structure as coSame block structure as co--located MB in temporally located MB in temporally subsequent picturesubsequent picture

    –– MVs are computed as scaled version of corresponding MVs are computed as scaled version of corresponding MV of the coMV of the co--located MBlocated MB

    I0 B 1 B 2 B 3 P 4 B 5 B 6 B 7 P 8

  • 16

    Page 16

    BB--PicturesPictures

    f0 f1f1f0f1f0

    List 1 ReferenceList 0 Reference Current B

    MVMVF

    MVB

    ............

    current block co-located block

    Z = (TDB × 256)/ TDD MVF = (Z × MV +128) >> 8W= Z – 256 MVB = (W× MV +128) >> 8

    TDD

    TDB

    Time

    Mode DecisionMode Decision

    •• Block differenceBlock difference–– Diff(i,j) = Original(i,j) Diff(i,j) = Original(i,j) -- Prediction(i,j)Prediction(i,j)

    •• SAD and SATDSAD and SATDSAD and SATDSAD and SATD–– DiffT means apply Hadamard transform to DiffT means apply Hadamard transform to

    DiffDiff PredictionBlock_difference

    Hadamard transformSA(T)D

    SA(T)Dmin

    Integer-pel search∑=

    ji

    jiDiffSAD,

    ),(

    Loop for prediction mode decision

    2/)),((,∑=

    jijiDiffTSATD

  • 17

    Page 17

    Mode DecisionMode Decision

    •• Given the last decoded frames, Lagrange Given the last decoded frames, Lagrange multipliersmultipliers

    3/QP

    and the and the MBMB quantization quantization parameter QP.parameter QP.(N t(N t LL f B SP f i 4 tif B SP f i 4 ti

    ,

    ,285.0 3/

    MODEMOTION

    QPMODE

    LL

    L

    =

    ×=

    (Note: (Note: LLMODEMODE for B or SP frame is 4 times as for B or SP frame is 4 times as much as that for I or P frame.) much as that for I or P frame.)

    Mode DecisionMode Decision

    •• Choose intra prediction modes for the Intra Choose intra prediction modes for the Intra 4x4 macroblock mode by minimizing with4x4 macroblock mode by minimizing with

    •• Determine the best Intra16x16 prediction Determine the best Intra16x16 prediction mode by choosing the mode that results in the mode by choosing the mode that results in the minimum SATDminimum SATD

    { }DHORUHORLVERTRVERTDRDIAGDLDIAGVERTHORDCIMODE _,_,_,_,_,_,,,∈

    minimum SATD.minimum SATD.

  • 18

    Page 18

    Mode DecisionMode Decision•• For each 8x8 subFor each 8x8 sub--partitionpartition

    –– Perform motion estimation and reference frame selection by Perform motion estimation and reference frame selection by minimizingminimizing SSD + L x Rate(MV, REF)SSD + L x Rate(MV, REF)B frames: Choose prediction direction by minimizingB frames: Choose prediction direction by minimizing–– B frames: Choose prediction direction by minimizingB frames: Choose prediction direction by minimizingSSD + L x Rate(MV(PDIR), REF(PDIR))SSD + L x Rate(MV(PDIR), REF(PDIR))

    –– Determine the coding mode of the 8x8 subDetermine the coding mode of the 8x8 sub--partition using the ratepartition using the rate--constrained mode decision, i.e. minimizeconstrained mode decision, i.e. minimizeSSD + L x Rate(MV, REF, LumaSSD + L x Rate(MV, REF, Luma--Coeff, block 8x8 mode)Coeff, block 8x8 mode)

    •• Here the SSD calculation is based on the reconstructed Here the SSD calculation is based on the reconstructed signal after DCT, quantization, and IDCTsignal after DCT, quantization, and IDCT

    [ ] [ ]( )

    [ ] [ ]( )

    [ ] [ ]( )

    16,16 2

    1, 1

    8,8 2

    1, 1

    8,8 2

    1, 1

    ( , , | ) , , , |

    , , , |

    , , , | ,

    Y Yx y

    U Ux y

    V Vx y

    SSD s c MODE QP s x y c x y MODE QP

    s x y c x y MODE QP

    s x y c x y MODE QP

    = =

    = =

    = =

    = −

    + −

    + −

    Mode DecisionMode Decision

    •• Perform motion estimation and reference frame Perform motion estimation and reference frame selection for 16x16, 16x8, and 8x16 modes by selection for 16x16, 16x8, and 8x16 modes by minimizingminimizingminimizingminimizing

    •• B frames: Determine prediction direction by B frames: Determine prediction direction by minimizingminimizing

    ))())()((()))(,(,()()|)(,(

    REFRREFREFRLREFREFcsDTSALREFREFJ

    MOTION

    MOTION

    +−⋅+= pmmm

    )))(())()((()))(,(,()|(

    PDIRREFRPDIRPDIRRLPDIRPDIRcsSATDLPDIRJ

    MOTION

    MOTION

    +−⋅+= pmm

  • 19

    Page 19

    Mode DecisionMode Decision•• Choose the Choose the MBMB prediction mode by minimizingprediction mode by minimizing

    I:I:)|,,()|,,(),|,,( QPMODEcsRLQPMODEcsSSDLQPMODEcsJ MODEMODE ⋅+=I:I:

    P:P:

    B:B:

    { }1616,44 ××∈ INTRAINTRAMODE

    ⎭⎬⎫

    ⎩⎨⎧

    ××××××

    ∈,88,168,816,1616

    ,,1616,44 SKIPINTRAINTRAMODE

    ⎭⎬⎫

    ⎩⎨⎧

    ××××××

    ∈88,168,816,1616

    ,,1616,44 DIRECTINTRAINTRAMODE

    •• “skip mode” refers to the 16x16 mode where no motion “skip mode” refers to the 16x16 mode where no motion and residual information is encoded and residual information is encoded

    MPEGMPEG--4 AVC/H.264: Intra Prediction4 AVC/H.264: Intra Prediction

    ControlData

    Q t

    CoderControl

    Transform/

    InputVideoSignal

    Directional spatial prediction (9 types for luma, 1 chroma)

    Q A B C D E F G HI b d

    EntropyCoding

    Scaling & Inv. Transform

    Quant.Transf. coeffs

    Decoder

    Scal./Quant.-Split into

    Macroblocks16x16 pixels

    Intra-frame Prediction

    De-blockingFilter

    Output

    I a b c dJ e f g hK i j k lL m n o pMNOP

    18

    6

    Motion-Compensation

    MotionData

    Intra/Inter

    MotionEstimation

    OutputVideoSignal

    • e.g., Mode 3: diagonal down/right predictiona, f, k, p are predicted by (A + 2Q + I + 2) >> 2

    043

    57

  • 20

    Page 20

    Intra Prediction: 4x4 Luma BlocksIntra Prediction: 4x4 Luma Blocks•• Mode 0: vertical PredictionMode 0: vertical Prediction•• Mode 1: horizontal predictionMode 1: horizontal prediction•• Mode 2: DC predictionMode 2: DC predictionpp•• Mode 3: Diagonal down/left Mode 3: Diagonal down/left

    predictionprediction•• Mode 4: Mode 4: Diagonal down/right Diagonal down/right

    predictionprediction•• Mode 5: verticalMode 5: vertical--leftleft•• Mode 6: horizontalMode 6: horizontal--downdown

    0

    1

    43

    57

    8

    6

    •• Mode 7: verticalMode 7: vertical--rightright•• Mode 8: horizontalMode 8: horizontal--upup

    DC prediction:DC prediction:pred( x, y ) = Average of pixel A, B, C, D, E, pred( x, y ) = Average of pixel A, B, C, D, E,

    F, G, and HF, G, and H

    I A B C DE a b c dF e f g hG i j k lH m n o p

    Mode 0I A B C DE a b c dF e f g hG i j k lH m n o p

    Mode 1

    Intra Prediction: 4x4 Luma PredictionIntra Prediction: 4x4 Luma Prediction

  • 21

    Page 21

    Intra Prediction: 16x16 Luma BlocksIntra Prediction: 16x16 Luma Blocks

    •• Mode 0: VerticalMode 0: Vertical•• Mode 1: HorizontalMode 1: Horizontal

    P(15,-1)

    •• Mode 2: DCMode 2: DC•• Mode 3: PlaneMode 3: Plane

    –– Be used only if all neighboring Be used only if all neighboring samples are availablesamples are available

    Pred(x,y) = Clip( (a + b·(x-7) + c·(y-7) +16) >> 5 ),where

    P(-1,15)

    (x,y)

    wherea = 16·(P(-1,15) + P(15,-1))b = (5*H+32)>>6c = (5*V+32)>>6

    8

    1( (7 , 1) (7 , 1))

    xH x P x P x

    =

    = ⋅ + − − − −∑8

    1( ( 1,7 ) ( 1,7 ))

    yV y P y P y

    =

    = ⋅ − + − − −∑

    Intra Prediction: 16x16 Luma BlocksIntra Prediction: 16x16 Luma Blocks

    .

    …….. ……

    .

    H

    Mean(H+V)V

  • 22

    Page 22

    MPEGMPEG--4 AVC/H.264: Transform Coding4 AVC/H.264: Transform Coding

    ControlData

    CoderControl

    InputVideoSignal

    EntropyCoding

    Scaling & Inv. Transform

    Quant.Transf. coeffs

    Decoder

    Transform/Scal./Quant.-

    Split intoMacroblocks16x16 pixels

    Intra-frame P di ti

    De-blockingFilter

    4x4 Block Integer Transform

    Main Profile: Adaptive Block Size T f (8 4 4 8 8 8)

    1 1 1 12 1 1 21 1 1 11 2 2 1

    ⎡ ⎤⎢ ⎥− −⎢ ⎥=⎢ ⎥− −⎢ ⎥

    − −⎢ ⎥⎣ ⎦

    H

    Motion-Compensation

    MotionData

    Intra/Inter

    MotionEstimation

    PredictionOutputVideoSignal

    Transform (8x4,4x8,8x8)Repeated transform of DC coeffs for 8x8 chroma and 16x16 Intra luma blocks

    Transform Coding: Luma DCTransform Coding: Luma DC

    •• Luma DC in Intra_16x16 MBLuma DC in Intra_16x16 MB–– Using Hadamard transformationUsing Hadamard transformation

    00 01 02 03

    10 11 12 13

    20 21 22 23

    30 31 32 33

    1 1 1 1 1 1 1 11 1 1 1 1 1 1 1

    // 21 1 1 1 1 1 1 11 1 1 1 1 1 1 1

    D D D D

    D D D DD

    D D D D

    D D D D

    x x x xx x x x

    Yx x x xx x x x

    ⎛ ⎞⎡ ⎤⎡ ⎤ ⎡ ⎤⎜ ⎟⎢ ⎥⎢ ⎥ ⎢ ⎥− − − −⎜ ⎟⎢ ⎥⎢ ⎥ ⎢ ⎥= ⎜ ⎟⎢ ⎥⎢ ⎥ ⎢ ⎥− − − −⎜ ⎟⎢ ⎥⎢ ⎥ ⎢ ⎥⎜ ⎟− − − −⎢ ⎥ ⎢ ⎥⎢ ⎥⎣ ⎦ ⎣ ⎦⎣ ⎦⎝ ⎠

    Forward transform:

    00 01 02 03

    10 11 12 13

    20 21 22 23

    30 31 32 33

    1 1 1 1 1 1 1 11 1 1 1 1 1 1 11 1 1 1 1 1 1 11 1 1 1 1 1 1 1

    QD QD QD QD

    QD QD QD QDQD

    QD QD QD QD

    QD QD QD QD

    y y y yy y y y

    Xy y y yy y y y

    ⎡ ⎤⎡ ⎤ ⎡ ⎤⎢ ⎥⎢ ⎥ ⎢ ⎥− − − −⎢ ⎥⎢ ⎥ ⎢ ⎥=⎢ ⎥⎢ ⎥ ⎢ ⎥− − − −⎢ ⎥⎢ ⎥ ⎢ ⎥

    − − − −⎢ ⎥ ⎢ ⎥⎢ ⎥⎣ ⎦ ⎣ ⎦⎣ ⎦

    Inverse transform:

  • 23

    Page 23

    Transform Coding: Luma DCTransform Coding: Luma DC

    0 1

    2 3

    CBPY 8*8 block order(raster scan order in MB)

    10 4 5

    2 3 6 7

    8 9 12 13

    2x2 DCCb Cr16 17

    -1Y

    ...

    Luma 4x4 DC for Intra 16x16macroblock type

    18 19 22 23

    Luma 4x4 block order for 4x4intra prediction and 4x4residual coding(raster scan order within 8x8region nested in raster scanorder of 8x8 regions)

    Chroma 4x4 block order for4x4 residual coding, shownas 16-25, and intra 4x4prediction, shown as 18-21and 22-25 (raster scan orderin each 8x8 chroma region)

    8 9 12 13

    10 11 14 15AC

    20 21 24 25

    Transform Coding: Chroma DCTransform Coding: Chroma DC

    •• Chroma DC in 8x8 blockChroma DC in 8x8 block–– Hadamard transformationHadamard transformation

    00 01

    10 11

    1 1 1 11 1 1 1

    D DD

    D D

    x xY

    x x⎡ ⎤⎡ ⎤ ⎡ ⎤

    = ⎢ ⎥⎢ ⎥ ⎢ ⎥− −⎣ ⎦ ⎣ ⎦⎣ ⎦

    Forward transform:

    Inverse transform:Inverse transform:

    ⎥⎦

    ⎤⎢⎣

    ⎡−⎥⎦

    ⎤⎢⎣

    ⎡⎥⎦

    ⎤⎢⎣

    ⎡−

    =11

    1111

    11

    1110

    0100

    QDQD

    QDQDQD YY

    YYX

  • 24

    Page 24

    Transform: Luma and Chroma residualTransform: Luma and Chroma residual

    •• Luminance and chrominance 4x4 residual blocksLuminance and chrominance 4x4 residual blocks•• Forward transformForward transform

    •• Inverse TransformInverse Transform

    00 01 02 03

    10 11 12 13

    20 21 22 23

    30 31 32 33

    1 1 1 1 1 2 1 12 1 1 2 1 1 1 21 1 1 1 1 1 1 21 2 2 1 1 2 1 1

    x x x xx x x x

    Yx x x xx x x x

    ⎡ ⎤⎡ ⎤ ⎡ ⎤⎢ ⎥⎢ ⎥ ⎢ ⎥− − − −⎢ ⎥⎢ ⎥ ⎢ ⎥=⎢ ⎥⎢ ⎥ ⎢ ⎥− − − −⎢ ⎥⎢ ⎥ ⎢ ⎥

    − − − −⎢ ⎥ ⎢ ⎥⎢ ⎥⎣ ⎦ ⎣ ⎦⎣ ⎦

    12 00 01 02 03

    1 112 210 11 12 132

    120 21 22 232

    1 11 30 31 32 33 2 22

    1 1 1 1 1 1 11 11 1 11 1 1 11 1 1

    1 11 1 1

    y y y yy y y y

    Xy y y yy y y y

    ⎡ ⎤ ⎡ ⎤⎡ ⎤⎢ ⎥ ⎢ ⎥⎢ ⎥ − −− −⎢ ⎥ ⎢ ⎥⎢ ⎥= ⎢ ⎥ ⎢ ⎥⎢ ⎥ − −− −⎢ ⎥ ⎢ ⎥⎢ ⎥⎢ ⎥ − −⎢ ⎥ ⎢ ⎥− − ⎣ ⎦ ⎣ ⎦⎣ ⎦

    Quantization/Dequantization (1/6)Quantization/Dequantization (1/6)

    •• Scan OrderScan Order–– 4x4 residual and 4x4 luma DC block4x4 residual and 4x4 luma DC block

    0 1 5 6

    2 4 7 12

    3 8 11 13

    9 10 14 15

    –– 2x2 chroma DC block2x2 chroma DC block

    •• Raster orderRaster order

  • 25

    Page 25

    Quantization/Dequantization (2/6)Quantization/Dequantization (2/6)

    •• QP: 0 ~ 51QP: 0 ~ 51•• QPQPYY: QP for: QP for lumaluma coefficientscoefficientsQPQPYY: QP for : QP for lumaluma coefficientscoefficients•• QPQPCC: QP for : QP for chromachroma coefficientscoefficients

    –– QPQPCC for for chromachroma is determined from the current value is determined from the current value of QPof QPYY

    QPQPYY

  • 26

    Page 26

    Quantization/Dequantization (4/6)Quantization/Dequantization (4/6)

    •• 4x4 luma DC block4x4 luma DC block•• QuantizationQuantization

    ( ) ( ) ( ) 18 / 6QP+⎡ ⎤( ) ( ) ( ) 18 / 6, , %6,0,0 2 / 2 , , = 0, ,3QPQD DY i j Y i j Q QP f i j+⎡ ⎤= ⋅ + ⋅⎣ ⎦ …

    f = 217+QP/6/3 for intra framesf = 217+QP/6/6 for inter framesf have the same sign as the coefficient that is being quantized

    •• DequantizationDequantization( ) ( ) ( ), , %6,0,0 // 4, , = 0, ,3D QDX i j X i j R QP i j⎡ ⎤= ⋅⎣ ⎦ …

    Quantization/Dequantization (5/6)Quantization/Dequantization (5/6)

    •• 2x2 chroma DC2x2 chroma DC•• QuantizationQuantization

    ( ) ( ) ( ) 18 / 6QP+⎡ ⎤

    f = 217+QP/6/3 for intra framesf = 217+QP/6/6 for inter framesf have the same sign as the coefficient that is being quantized

    ( ) ( ) ( ) 18 / 6, , %6,0,0 2 / 2 , , = 0,1QPQD DY i j Y i j Q QP f i j+⎡ ⎤= ⋅ + ⋅⎣ ⎦

    •• DequantizationDequantization( ) ( ) ( ), , %6,0,0 // 2, , = 0,1D QDX i j X i j R QP i j⎡ ⎤= ⋅⎣ ⎦

  • 27

    Page 27

    Quantization/Dequantization (6/6)Quantization/Dequantization (6/6)•• Q[QP%6][i][j] = quantMat[QP%6][0] for (i,j) = {(0,0),(0,2),(2,0),(2,2)},Q[QP%6][i][j] = quantMat[QP%6][0] for (i,j) = {(0,0),(0,2),(2,0),(2,2)},•• Q[QP%6][i][j] = quantMat[QP%6][1] for (i,j) = {(1,1),(1,3),(3,1),(3,3)},Q[QP%6][i][j] = quantMat[QP%6][1] for (i,j) = {(1,1),(1,3),(3,1),(3,3)},•• Q[QP%6][i][j] = quantMat[QP%6][2] otherwise.Q[QP%6][i][j] = quantMat[QP%6][2] otherwise.

    •• R[QP%6][i][j] = dequantMat[QP%6][0] for (i,j) = {(0,0),(0,2),(2,0),(2,2)},R[QP%6][i][j] = dequantMat[QP%6][0] for (i,j) = {(0,0),(0,2),(2,0),(2,2)},•• R[QP%6][i][j] = dequantMat[QP%6][1] for (i,j) = {(1,1),(1,3),(3,1),(3,3)},R[QP%6][i][j] = dequantMat[QP%6][1] for (i,j) = {(1,1),(1,3),(3,1),(3,3)},•• R[QP%6][i][j] = dequantMat[QP%6][2] otherwise.R[QP%6][i][j] = dequantMat[QP%6][2] otherwise.•• quantMat[6][3] = {{13107, 5243, 8224},quantMat[6][3] = {{13107, 5243, 8224},

    {11651, 4660, 7358},{11651, 4660, 7358},{10486, 4143, 6554},{10486, 4143, 6554},{ 9198, 3687, 5825},{ 9198, 3687, 5825},{ 8322 3290 5243}{ 8322 3290 5243}{ 8322, 3290, 5243},{ 8322, 3290, 5243},{ 7384, 2943, 4660}};{ 7384, 2943, 4660}};

    •• dequantMat[6][3] = {{40, 64, 51},dequantMat[6][3] = {{40, 64, 51},{45, 72, 57},{45, 72, 57},{50, 81, 64},{50, 81, 64},{57, 91, 72},{57, 91, 72},{63, 102, 80},{63, 102, 80},{71, 114, 90}};{71, 114, 90}};

    MPEGMPEG--4 AVC/H.264: Multiple Reference 4 AVC/H.264: Multiple Reference FramesFrames

    ControlD t

    CoderControl

    EntropyCoding

    Deq./Inv. Transform

    Motion-Compensated

    Data

    Quant.Transf. coeffs

    0

    Decoder

    Transform/Quantizer-

    MotionData

    CompensatedPredictorIntra/Inter

    MotionEstimator

    Multiple Reference Frames for Motion Compensation

  • 28

    Page 28

    MPEGMPEG--4 AVC/H.264: Residual Coding4 AVC/H.264: Residual Coding

    Control

    CoderControl

    Residual coding is based on 4x4 blocks

    EntropyCoding

    Deq./Inv. Transform

    Motion-

    ControlData

    Quant.Transf. coeffs

    0

    Decoder

    Transform/Quantizer-

    Integer Transform

    CompensatedPredictor

    MotionData

    Intra/Inter

    MotionEstimator

    Residual and Intra CodingResidual and Intra Coding

    •• EXACT MATCHEXACT MATCH Simplified TransformSimplified Transform–– Based primarily on 4x4 transform (all prior standardsBased primarily on 4x4 transform (all prior standards:: 8x8)8x8)

    –– Requires only Requires only 16 bit16 bit arithmetic (including intermediate values)arithmetic (including intermediate values)–– Expanded to 8x8 for chroma by 2x2 transform of the DC valuesExpanded to 8x8 for chroma by 2x2 transform of the DC values

    Easily extensible to 10Easily extensible to 10 12 bits per component12 bits per component–– Easily extensible to 10Easily extensible to 10--12 bits per component12 bits per component

    •• Adaptive block transform sizes for Main ProfileAdaptive block transform sizes for Main Profile•• Intra Coding StructureIntra Coding Structure

    –– Directional spatial prediction (10 types luma, 1 chroma)Directional spatial prediction (10 types luma, 1 chroma)–– Expanded to 16x16 for luma intra by 4x4 transform of the DC valuesExpanded to 16x16 for luma intra by 4x4 transform of the DC values

  • 29

    Page 29

    Quantization and DeblockingQuantization and Deblocking

    •• Quantization of transform coefficientsQuantization of transform coefficientsLogarithmic step size controlLogarithmic step size control–– Logarithmic step size controlLogarithmic step size control

    –– Extended range of step sizesExtended range of step sizes–– Smaller step size for chromaSmaller step size for chroma

    (per H.263 Annex T)(per H.263 Annex T)–– TableTable--drivendriven

    •• Reconstruction is 16Reconstruction is 16--bit multiply, add, shiftbit multiply, add, shifteco st uct o s 6eco st uct o s 6 b t u t p y, add, s tb t u t p y, add, s t•• Deblocking Filter (in the prediction loop)Deblocking Filter (in the prediction loop)

    Deblocking FilterDeblocking Filter

    16*16 Macroblock 16*16 Macroblock

    Horizontal edges(luma)

    Horizontal edges(chroma)

    Boundaries in a macroblock to be filtered (luma boundaries shown with solid lines and chroma boundaries shown with dotted lines)

    Vertical edges(chroma)

    Vertical edges(luma)

  • 30

    Page 30

    Deblocking FilterDeblocking Filter

    •• Content dependent boundary filtering Content dependent boundary filtering strengthstrengthstrengthstrength–– For each boundary between neighbouring 4x4 For each boundary between neighbouring 4x4

    lumaluma blocks, a “Boundary Strength” blocks, a “Boundary Strength” BsBs is is assignedassigned

    –– If If Bs Bs = 0= 0, filtering is skipped for that particular , filtering is skipped for that particular edgeedgeIn all other cases filtering is dependent on theIn all other cases filtering is dependent on the–– In all other cases, filtering is dependent on the In all other cases, filtering is dependent on the local sample properties and the value of local sample properties and the value of BsBs

    Deblocking FilterDeblocking Filter•• Flowchart to determine the boundary strength Flowchart to determine the boundary strength BsBs

    Block boundarybetween block p and qbetween block p and q

    Block p or qintra coded or

    slice type is SI or SP?

    Bs=3

    Block boundaryis also Macroblock

    boundary?

    Coefficientscoded in block

    p or q?

    Bs=2Bs=4YES NO

    YES

    YES

    NO

    NO

    Block p and q havedifferent reference framesor a different number of

    reference frames?

    NO YES

    |V1(p,x) - V1(q,x)| >= 1 or|V1(p,y) - V1(q,y)| >= 1 or

    if bi-predictive|V2(p,x) - V2(q,x)| >= 1 or|V2(p,y) - V2(q,y)| >= 1

    Bs=0(skip)Bs=1

    reference frames?

    YES NO

  • 31

    Page 31

    Deblocking FilterDeblocking Filter

    •• Thresholds for each block boundaryThresholds for each block boundary–– Set of samples across this edge are only filtered if the Set of samples across this edge are only filtered if the

    conditionconditionconditioncondition–– Bs ≠ 0Bs ≠ 0 && && |p|p00 –– qq00| < | < αα &&&& |p|p11 –– pp00| < | < ββ &&&& |q|q11 –– qq00| |

    < < ββ–– αα andand ββ are determined by are determined by IndexA and IndexB IndexA and IndexB

    respectivelyrespectively–– IndexA = Clip3(0, 51, QPav + Filter_Offset_A)IndexA = Clip3(0, 51, QPav + Filter_Offset_A)

    I d B Cli 3(0 51 QP Filt Off t B)I d B Cli 3(0 51 QP Filt Off t B)–– IndexB = Clip3(0, 51, QPav + Filter_Offset_B)IndexB = Clip3(0, 51, QPav + Filter_Offset_B)–– Filter_Offset_A and Filter_Offset_B used to modify filter Filter_Offset_A and Filter_Offset_B used to modify filter

    characteristicscharacteristics

    Clip3( a, b, c) = ⎪⎩

    ⎪⎨

    ⎧><

    otherwise;;;

    cbcbaca

    p3 p2 p1 p0 q0 q1 q2 q3

    Deblocking Filter: Deblocking Filter: BsBs < 4< 4

    •• ΔΔ = = Clip3(Clip3( --C, C, C, C, ((((((qq00 –– pp00)) 3) ) >> 3) )

    C (C ( ))•• PP00 = Clip1(= Clip1( pp00++ΔΔ ) ) •• QQ00 = Clip1(= Clip1(qq00-- ΔΔ))

    –– apap = = |p|p22 –– pp00||–– aqaq = = |q|q22 –– qq00||–– If If apap < < ββ,, PP11 = = pp11 + Clip3( + Clip3( --CC00, C, C00,, ((p2p2 + ( + ( pp00 + q+ q00 )>>1)>>1 ––

    (( 1 1)1 1)) 1)) 1)((p1 1) –– If If aqaq < < ββ,, QQ11 = = qq11 + Clip3( + Clip3( --CC00, C, C00,, ((q2q2 + ( + ( pp00 + q+ q00 )>>1)>>1 ––

    ((qq11 1) 1)–– CC00 is determined by is determined by IndexAIndexA and and BsBs–– Clip1(x) = clip3(0, 255, x)Clip1(x) = clip3(0, 255, x)

  • 32

    Page 32

    Deblocking Filter: Deblocking Filter: BsBs = 4= 4

    •• Left/upper sideLeft/upper side•• If the following condition holds:If the following condition holds:

    –– ap < ap < ββ &&&& |p|p00 –– qq00| | < ((< ((αα >> 2) + 2)>> 2) + 2) …………(8(8--71)71)–– PP00 = ( = ( pp22 + 2*+ 2*pp11 + 2*+ 2*pp00 + 2*+ 2*qq00 + + qq11 + 4) >> 3+ 4) >> 3–– PP11 = ( = ( pp22 + + pp11 + + pp00 + + qq00 + + 22) >> 2) >> 2–– In the case of luma filtering, In the case of luma filtering, –– PP22 = ( 2*= ( 2*p3p3 + + 3*3*pp22 + + pp11 + + pp00 + + qq00 + + 44) >> 3) >> 3

    •• Otherwise, if the condition of (8Otherwise, if the condition of (8--71) does not 71) does not hold, hold, –– PP00 = ( 2*= ( 2*pp11 + + pp00 + + qq11 + 2) >> 2+ 2) >> 2

    Deblocking Filter: Deblocking Filter: BsBs = 4= 4

    •• Right/lower sideRight/lower side•• if the following condition holds:if the following condition holds:

    –– aq < aq < ββ &&&& |p|p00 –– qq00| < | < ((((αα >>>> 2) +2)2) +2) (8(8--76)76)–– QQ00 = ( = ( pp11 + 2*+ 2*pp00 + 2*+ 2*qq00 + 2*+ 2*qq11 + + qq22 + 4) >> 3 + 4) >> 3 (8(8--77)77)–– QQ11 = ( = ( pp00 + + qq00 ++ qq11 + + qq22 + 2) >> 2+ 2) >> 2 (8(8--78)78)–– In the case of luma filtering, In the case of luma filtering, –– QQ22 = ( 2*= ( 2*qq33 + 3*+ 3*qq22 + + qq11 + + qq00 + + pp00 + 4) >> 3+ 4) >> 3 (8(8--79)79)

    •• Otherwise, if the condition of (8Otherwise, if the condition of (8--76) does not hold,76) does not hold,–– QQ00 = ( 2*= ( 2*qq11 + + qq00 + + pp11 + 2) >> 2+ 2) >> 2

  • 33

    Page 33

    Deblocking FilterDeblocking Filter

    Deblocking filter: Highly compressed decoded inter picture

    1) Without Filter 2) with H264/AVC Deblocking

    Entropy CodingEntropy Coding

    ControlData

    CoderControl

    Transform/

    InputVideoSignal

    EntropyCoding

    Inv. Scal. & Transform

    Quant.Transf. coeffs

    Decoder

    Transform/Scal./Quant.-

    Split intoMacroblocks16x16 pixels

    Intra-frame Prediction

    De-blockingFilter

    Motion-Compensation

    MotionData

    Intra/Inter

    MotionEstimation

    OutputVideoSignal

  • 34

    Page 34

    Variable Length CodingVariable Length Coding

    Exp-Golomb code is used universally for all symbols except for transform coefficientsContext adaptive VLCs for coding of transform coefficients• No end-of-block, but number of coefficients

    is decoded• Coefficients are scanned backwards• Coefficients are scanned backwards• Contexts are built dependent on transform

    coefficients

    ContentContent--based Adaptive Binary based Adaptive Binary Arithmetic Coding (CABAC)Arithmetic Coding (CABAC)

    Usage of adaptive probability models for most symbolsmost symbolsExploiting symbol correlations by using contextsRestriction to binary arithmetic coding• Simple and fast adaptation mechanismp p• Fast binary arithmetic codec based on table

    look-ups and shifts onlyAverage bit-rate saving over CAVLC 10-15%

  • 35

    Page 35

    SP/SI FrameSP/SI Frame•• SP frame:SP frame:

    –– motionmotion--compensated predictive codingcompensated predictive coding–– similar to Psimilar to P–– similar to P similar to P –– SP allows identical reconstruction even when different SP allows identical reconstruction even when different

    reference pictures are being usedreference pictures are being used

    •• SI frame:SI frame:–– spatial predictionspatial prediction–– similar to Isimilar to I–– SI allows identical reconstruction to a corresponding SI allows identical reconstruction to a corresponding

    SP SP

    •• provide functionalities for bitstream switching, provide functionalities for bitstream switching, splicing, random access, VCR functionalities such splicing, random access, VCR functionalities such as fastas fast--forward, and error resilience/recoveryforward, and error resilience/recovery

    SP/SI Frame: Bitstream SwitchingSP/SI Frame: Bitstream Switching

    Bitstream 2Bitstream 2 S2 PPP P

    S12

    Bitstream 1 S 1PP P P

  • 36

    Page 36

    SP/SI Frame: Bitstream SplicingSP/SI Frame: Bitstream Splicing

    Bitstream 2Bitstream 2 S2 PPP P

    SI2

    Bitstream 1 S 1PP P P

    SP/SI Frame: Error Resiliency/RecoverySP/SI Frame: Error Resiliency/Recovery

    S2S1 PPP P

    S12

    P

    SI2

  • 37

    Page 37

    Profiles and LevelsProfiles and Levels

    ProfilesProfiles

    •• Baseline profileBaseline profile•• Extended profileExtended profileExtended profile Extended profile •• Main profileMain profile

  • 38

    Page 38

    Baseline ProfileBaseline Profile

    •• I and P picture typeI and P picture type•• InIn--loop deblocking filterloop deblocking filter•• 1/41/4--sample motion compensationsample motion compensation•• VLCVLC--based entropy coding: CAVLCbased entropy coding: CAVLC•• 4:2:0 Chrominance format4:2:0 Chrominance format•• Field picturesField pictures (for Level 2.1 and above)(for Level 2.1 and above)•• use 15 or fewer Reference Framesuse 15 or fewer Reference Frames•• have a compression ratio per picture of 4:1 or have a compression ratio per picture of 4:1 or

    greatergreater

    Extended ProfileExtended Profile

    •• BiBi--predictive slicespredictive slices•• SP and SI slicesSP and SI slices•• Weighted predictionWeighted prediction•• All features included in the Baseline ProfileAll features included in the Baseline Profile

  • 39

    Page 39

    Main ProfileMain Profile

    •• CABACCABAC•• Interlaced picturesInterlaced pictures•• All features included in the Baseline ProfileAll features included in the Baseline Profile

    Level DefinitionsLevel DefinitionsLevel #Level # Max Max

    Picture Picture Size (MBs)Size (MBs)

    Max Max VideoVideoBitrate Bitrate (1000 (1000 bits/sec)bits/sec)

    Horizontal MV Horizontal MV Range Range (full pels)(full pels)

    Vertical MV Vertical MV Range Range (full pels)(full pels)

    Minimum luma Minimum luma BiBi--predictive predictive block sizeblock size

    ))

    11 9999 6464 [[--2048, 2047.75]2048, 2047.75] [[--64, 63.75]64, 63.75] 8x88x8

    1.11.1 396396 128128 [[--2048, 2047.75]2048, 2047.75] [[--128, 127.75]128, 127.75] 8x88x8

    1.21.2 396396 768768 [[--2048, 2047.75]2048, 2047.75] [[--128, 127.75]128, 127.75] 8x88x8

    22 396396 20002000 [[--2048, 2047.75]2048, 2047.75] [[--128, 127.75]128, 127.75] 8x88x8

    2.12.1 792792 40004000 [[--2048, 2047.75]2048, 2047.75] [[--256, 255.75]256, 255.75] 8x88x8

    2.22.2 16201620 40004000 [[--2048, 2047.75]2048, 2047.75] [[--256, 255.75]256, 255.75] 8x88x8[[ , ], ] [[ , ], ]

    33 16201620 80008000 [[--2048, 2047.75]2048, 2047.75] [[--256, 255.75]256, 255.75] 8x88x8

    3.13.1 36003600 2000020000 [[--2048, 2047.75]2048, 2047.75] [512, 511.75][512, 511.75] 8x88x8

    3.23.2 51205120 2000020000 [[--2048, 2047.75]2048, 2047.75] [512, 511.75][512, 511.75] 8x88x8

    44 81928192 2000020000 [[--2048, 2047.75]2048, 2047.75] [512, 511.75][512, 511.75] 8x88x8

    55 1920019200 TBDTBD [[--2048, 2047.75]2048, 2047.75] TBDTBD 8x88x8

  • 40

    Page 40

    H.264 Codec Design SummaryH.264 Codec Design Summary

    Video coding layer is based on hybrid video coding and similar in spirit to other standards but with important differencesNew key features are:• Enhanced motion compensation• Small blocks for transform coding• Improved de-blocking filterImproved de blocking filter• Enhanced entropy coding

    Substantial bit-rate savings relative to other standards for the same quality

    Complexity of H.264 Codec DesignComplexity of H.264 Codec Design

    •• Codec design includes relaxation of traditional bounds Codec design includes relaxation of traditional bounds on complexity (memory & computation) on complexity (memory & computation) –– rough guess rough guess 22--3x decoding power increase relative to MPEG3x decoding power increase relative to MPEG--2 32 3--4x4x22 3x decoding power increase relative to MPEG3x decoding power increase relative to MPEG 2, 32, 3 4x 4x encodingencoding

    •• Problem areas:Problem areas:–– Smaller block sizes for motion compensation (cache access Smaller block sizes for motion compensation (cache access

    issues)issues)–– Longer filters for motion compensation (more memory access)Longer filters for motion compensation (more memory access)–– MultiMulti--frame motion compensation (more memory for reference frame motion compensation (more memory for reference p ( yp ( y

    frame storage)frame storage)–– More segmentations of macroblock to choose from (more More segmentations of macroblock to choose from (more

    searching in the encoder)searching in the encoder)–– More methods of predicting intra data (more searching)More methods of predicting intra data (more searching)–– Arithmetic coding (adaptivity, computation on output bits)Arithmetic coding (adaptivity, computation on output bits)

  • 41

    Page 41

    Performance ComparisonPerformance Comparison

    •• Test of different standardsTest of different standards•• Using same rateUsing same rate--distortion optimization techniques for distortion optimization techniques for

    all codecsall codecs•• Streaming test: HighStreaming test: High--latency (included B frames)latency (included B frames)•• RealReal--time conversation test: No B framestime conversation test: No B frames•• Several video sequences for each testSeveral video sequences for each test•• Compare four codecs:Compare four codecs:

    –– MPEGMPEG--2 (in high2 (in high--latency/streaming test only)latency/streaming test only)–– H.263 (highH.263 (high--latency profile, conversational highlatency profile, conversational high--compression compression

    profile, baseline profile)profile, baseline profile)–– MPEGMPEG--4 (simple profile and advanced simple profile with & 4 (simple profile and advanced simple profile with &

    without B pictures)without B pictures)–– JVT/H.26L/AVC (with & without B pictures)JVT/H.26L/AVC (with & without B pictures)

    Coding Efficiency Comparison (1/4)Coding Efficiency Comparison (1/4)

    Half-pelmotion

    compensation

    Framedifference

    coding

    PSNR[dB]

    TMN-10Variable

    block size

    32

    34

    36

    38 Foreman10 Hz, QCIF

    100 frames encoded

    compensation(MPEG-1 1993)

    g(H.120 1988)

    IntraframeDCT coding

    ? 67 %

    block sizemotion

    compensation(H.263 1998)

    0 100 200 300 400 50026

    28

    30

    32

    Integer-pelmotion

    compensation(H.261 1991)

    DCT coding(DCT 1974, JPEG 1992)

    Bit-Rate [kbps]

  • 42

    Page 42

    Coding Efficiency Comparison (2/4)Coding Efficiency Comparison (2/4)

    3839

    Foreman QCIF 10Hz

    3031323334353637

    QualityY-PSNR [dB]

    MPEG-2H.263

    MPEG-4JVT/H.264/AVC

    27282930

    0 50 100 150 200 250Bit-rate [kbit/s]

    Coding Efficiency Comparison (3/4)Coding Efficiency Comparison (3/4)

    Alias 24 fps SDTV

    50

    35

    40

    45

    Y PS

    NR

    MPEG-2(QP 2-7)AVC (QP 10,18,26)

    25

    30

    0.00 0.50 1.00 1.50 2.00 2.50 3.00 3.50 4.00 4.50 5.00

    Mbit/sec

  • 43

    Page 43

    Coding Efficiency Comparison (4/4)Coding Efficiency Comparison (4/4)

    5

    61st

    MPEG-2 Encoder

    2nd GenerationEncoder

    2

    3

    4

    Mbi

    t/s

    MPEG-2MPEG-4H.26LH.263

    3rd GenerationEncoder

    4th GenerationEncoder

    5th GenerationEncoder

    0

    1

    1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005

    H.264 /MPEG-4 part 10

    Source: Modulus Video

    Test Set Results for Perceptual QualityTest Set Results for Perceptual Quality

    •• Informal perceptual testsInformal perceptual tests•• At the same PSNR, people generally prefer JVTAt the same PSNR, people generally prefer JVTp p g y pp p g y p•• Why?Why?

    –– Small motion compensation block sizeSmall motion compensation block size(breaks up block structure)(breaks up block structure)

    –– Small transform block sizeSmall transform block size(breaks up block structure, reduces ringing)(breaks up block structure, reduces ringing)

    –– InIn--loop deblocking filterloop deblocking filter

    •• By how much?By how much?–– Needs further studyNeeds further study–– No rigorous testing reportedNo rigorous testing reported–– 1010--15% might be a good guess15% might be a good guess

  • 44

    Page 44

    How were the Improvements ObtainedHow were the Improvements Obtained

    • It mainly comes from incremental improvements:

    -- Better predictionBetter prediction-- More computationMore computation-- More memoryMore memory

    • No fundamental changes in the basic algorithm(DCT + MCPC)(DCT + MCPC)

    ConclusionsConclusions

    Video coding layer is based on hybrid video coding and similar in spirit to other standards but with important differencesNew key features are:New key features are:• Enhanced motion compensation• Small blocks for transform coding• Improved deblocking filter• Enhanced entropy coding

    Bit-rate savings generally 50% or better against any other standard for the same perceptual quality (especially for higher-l t li ti ll i B i t )latency applications allowing B pictures)Increased complexity relative to prior standardsStandard of both ITU-T VCEG and ISO/IEC MPEGStandardization completing around end of this year to Spring of next year