jun. 2, 2008 student: shang-yu yeh ( 葉尚諭 ) advisor: dr. hsueh -ming hang ( 杭學鳴 )

59
Jun. 2, 2008 Student: Shang-Yu Yeh ( 葉葉葉 ) Advisor: Dr. Hsueh-Ming Hang ( 葉 葉葉 ) Coding Efficiency and Quality Improvement for MPEG Surround Encoding 1

Upload: keira

Post on 22-Jan-2016

55 views

Category:

Documents


0 download

DESCRIPTION

Coding Efficiency and Quality Improvement for MPEG Surround Encoding. Jun. 2, 2008 Student: Shang-Yu Yeh ( 葉尚諭 ) Advisor: Dr. Hsueh -Ming Hang ( 杭學鳴 ). My Work. Design MPEG Surround Encoding Algorithms Subset coding mode Parameter band stride Parameter set - PowerPoint PPT Presentation

TRANSCRIPT

1

Jun. 2, 2008Student: Shang-Yu Yeh ()Advisor: Dr. Hsueh-Ming Hang ()

Coding Efficiency and Quality Improvement for MPEG Surround Encoding11My WorkDesign MPEG Surround Encoding AlgorithmsSubset coding mode Parameter band strideParameter setAdaptive parameter smoothingImplementation and Simulation on the Reference Software2work:spectoolsencodermodulecodingimplementref sw encoder

2OutlineMPEG Surround IntroductionProposed Modules and Experimental ResultsConclusions and Future WorkDemo

3outlinempsdemo3OutlineMPEG Surround IntroductionSpatial HearingMPEG Surround EncoderMPEG Surround DecoderProposed Modules and Experimental ResultsConclusions and Future WorkDemo

4mpsencoderdecoder4Spatial HearingDescribing how human locate sound source in the horizontal placeInteraural Level Difference (ILD)Interaural Time Difference (ITD)Interaural Coherence (IC)

5spatial imagespatial hearingimagesource(direct left)time delayintensitytimeleveldifferencenon-coherencenon-coherenceIC

5MPEG SurroundLow-bitrate parametric coding technology for multi-channel audio signalsBackward compatibility to the conventional stereo systemStandardizationCfP on SAC in March 2004Finalize in July, 2006 (ISO/IEC 23003-1)

6MPEG Surroundmulti-channelparametric coding(waveform coding(stereo) MPEG Surroundstandardization:04propose for sac2005mps2006finalize

6MPEG Surround EncoderCapture the spatial image of multi-channel audioGenerate a mono/stereo downmix

7mps Encoder(N-channel)fbbandsubband domain71banddomaindownmixchdmxinfolosschdmxsynthesistime domaindownmixaudio encoder/decoder(ex: mp3, aac etc)qcodingdmxdecoder

7MPEG Surround DecoderSynthesize multi-channel output signalBackward compatibility

8decencdecoderdownmixfbbitstreamN-channelsignaldecoderMPEG Surround decoderdownmixpath

8Downmix and Parameter ExtractionTwo elementary blocks construct hierarchical structuresR-OTT box (Reverse One-To-Two box)R-TTT box (Reverse Two-To-Three box)

9channel downmixchannelSpecbasic element: ottboxtttbox1upmix22upmix3decencoderR-ottboxR-TTTbox ;R-ottbox2-channel input1-channel outputdmxinfolossspatial parameterR-tttbox3-channel input2-channel output spatial parameterFor example: 5.1channelaudio downmix2-channel5253r-ottboxr-tttboxbox

9Parameter Sets and BandsParameter sets: grouping of time slotsParameter bands: grouping of subbands

10enc subband domain71bandfsize=2048band32samplechsamplegroupingsample groupgroupingparameter setframe8groupingparameter bandnon-uniform7codingframe2pspairingpscodingqpentropybit

10R-OTT BoxCreate a mono downmix from a stereo inputExtract relevant spatial parametersChannel Level Differences (CLD)

whereInter-Channel coherence (ICC)

11R-ottboxparametercld: channelICC: correlation

11R-TTT BoxCreate a stereo downmix from three input channels

Two way to reconstruction the 3rd signalPrediction mode: 2 CPCs and ICC

Energy mode: 2CLDs

12

R-TTTbox3downmix3input()3()23Spec223cpcpredictionresidualiccenergymode2 cld3ch(codersbr)12Quantization and Entropy Coding SchemesQuantization - fine and coarseEntropy coding - Differential coding + Huffman tables

13quantizationfinecoarsecoarseQentropy codingdifferential codingDF and DTDTpilot-based codingdiffdatacodinghufftabtab1222datacodewordFPTPPCMraw data

rf sw encoderimplement PCM DF DT+1D Huff10:1513OutlineMPEG Surround IntroductionProposed Modules and Experimental ResultsSubset coding mode Parameter band strideParameter setAdaptive parameter smoothingConclusions and Future WorkDemo

14mps14New Encoder Structure4 Additional modules:

15encoder4moduleredundancy15Subset Coding Mode 4 coding modes for each parameter subset:Default(0)Keep(1)Interpolation(2)Lossless(3)Ref S/W implements only the Lossless mode

16QcodingSpecsubset4modesubsetpsparameter subsetps4modedecodersubset:d,k,I,Lsyntax (4mode)ref swimplementlossless modedecoder3mode16Subset Coding Mode Flow chart:Search each mode for the least error

Compare the error with a thresholdExploit correlation of time

17

flow chartsubsetmode0 1 2errorerrorxximode iminthresholdthreshold3modeerrormodelossless modemoduleredundancy

(errorreconstruction reference datadecoder(defaultlossless))17Experimental ResultsOnly the Lossless mode costs bitsThe bitrate reduction can be estimated:

Testsequencesps1ps2ps4theo_ps1exp_ps1theo_ps2exp_ps2theo_ps4exp_ps4159.0444.5751.8034.6435.2820.51275.7455.4677.6655.1674.5952.79366.8547.9858.8639.4140.0123.66459.5344.0650.9734.1133.4019.08563.3747.0459.2440.9642.0626.0518lossless modebitmodebitbitrate3psbs%codingmode%allsetlosslessdecisionmodeset

18Experimental Results2 observations:Theoretical results larger than experimental resultsDifferential coding schemesNumber of parameter sets increases => theoretical & experimental results decreaseProbability distributions192:1)? moduleentropy codingdtt moduledtcodinggaintotal2)psentropy coding 19Experimental ResultsDistributions of DT data:

CLDICCpdfstandard deviationps=1ps=4ps=1ps=41.772.130.841.2120? dtstdevseqseqps1ps4 (why????????)information theorycorrelationmoduletcorrelationps

17:1820Parameter Band StrideParameter band cannot be adjustedThe frequency resolution is adjusted by parameter band stride4 strides for each parameter subset

Parameter bandsParameter groups using different stridesStride 1Stride 2Stride 5Stride 28442115531177421101052114147312020104128281461

21toolfgroupsubbandpbpbencodingMpsfreq resol: pbstridestridesubsetbandbandbandstridingpbpgmps4stridestridesebsetpb4stridepgPbpgceiling functionpb14stide53pg1pg4pb25

21Parameter Band DecisionCombined with the pairing decisionFlow chart :2 successive lossless subsets1 single subsetExploit correlation of frequency

22

stridefreqredundancybandcorrelationpairingsubsetcodingstridestridepairinputframecoding modedatalosslesssetlosslesslosslesssubset(3133mode)subsetcodingstride2setstridetotal4

22Parameter Band Decision4 possible results:2 successive subsets in a pair with the same stride (>1)2 successive subsets using different strides (>1)2 successive subsets in a pair with stride=11 subset coded individually234?4codingstride3pairing1)stride>12)stridepair>1stride3)strideerrorbandstridesubset pair

reconstructdecdecoder

23Experimental ResultsThe bitrate can be estimated by :

Test sequenceps1ps2ps4theo_ps1exp_ps1theo_ps2exp_ps2theo_ps4exp_ps4145.2823.2936.1015.5827.649.95251.7821.8748.5720.9646.2819.54340.0417.1634.6113.9928.8710.71445.9923.5239.8918.6532.5113.18544.9323.0238.6518.2931.8713.3724stridebitrate3psbs%codingsubsetstride%subset_stride2stride2subsetR_stridexpbpgpb14stride53pg=14/3

24Experimental Results2 observations:Theoretical results larger than experimental resultsDifferential coding schemesNumber of parameter sets increases => theoretical & experimental results decreaseProbability distributions

2522coding mode:1)? moduleentropy codingdf moduledfcodinggaintotal2)ps

25Experimental ResultsDistributions of DF dataCLDICCpdfstandard deviationps=1ps=4ps=1ps=42.83.021.471.7526fbanddfstdevseqseqps1ps4information theorycorrelationmoduleps26Comparisons of Previous 2 ModulesUsing coding mode is more efficient than pbstrideCompare the DT and DF dataDTICCCLDDFICCCLDps=1ps=4ps=1ps=4ps=1ps=4ps=1ps=411.071.311.371.6611.852.142.953.2720.810.862.151.421.851.984.144.2430.791.171.941.7531.611.893.143.3940.841.211.772.1341.471.752.83.0250.921.141.461.6151.771.992.943.1927coding modepbstridecoding modestrideseqdt dfstdevcasedtdftcorrelationfmodestride(bitrateerrorerrorcmdpbsdappendix)

27Comparisons of Previous 2 ModulesUsing pbstride are more overestimated than using coding mode modules Differential coding schemes

28stridebitratecoding mode ()correl: 0.992(coding mode), 0.938(pbstride) ()differential coding5seqcoding modepcmdtdfstridepcmpcm2strideentropy codinggainstride28Experimental Results-Combined with Coding ModeBitrate reduction percentage: 25~55%Complexity: 0.13%ps1ps2ps4154.1442.0827.06258.3657.3855.36350.3743.0729.04452.7542.2027.36554.5248.0633.40292modulebitrate25~55pscomplexitymodule0.13%27:2829Parameter setDescribing the number of parameters for each parameter band2 kinds of framing:Fixed framing: divided into equal partsVariable framing: arbitrary divisions1~8 parameter setsRequiring dynamic decision 30time resolframeps?? Spec21)decpsdec2)dec8quality30Time ResolutionA border assignedLarge difference between nearby parametersCalculate the differences of backward and forward extractions

Set time borders at time slots with larger differences

31?time borderborderpspspstime slot31Time Resolution DecisionFlow chart:32

inputframeframeps1)tree structuretime slot2)2slot2sample peakthresholdpeak3)frametime slotpeakpeakborder ? peak peak countslotgroupgroupborder bordergroupslotcountcount countweighting

32Experimental Resultswaveforms(a) original signal(b) decoded signal using ps=1(c) decoded signal using adaptive decision

33ainputbps=1c?decpsps

33Experimental ResultsAdditional bitrate:

Complexity:Test sequences12345Additional bitrate(%)4.094.836.3424.784.0034bitrate4seqseq19.5%25%iteration9(par)*32(slot)*71(hyband)*2L(window)

33:2534Parameter SmoothingCompensate for artifacts caused by coarse quantizationPerformed at the decoder side1st order IIR filter35

toolcoarseqpartifactstationarytooldecodertemporal smoothing1st iirwl-1wkonj2sdeltasdeltapsslotddeterminatetauencoderwkonjwltau464, 128, 256, 51235Parameter SmoothingFlow chart:Compare coarse-quantized and smoothed parameters with fine quantized parametersChoose the configure with the least error

36

taufine qerrorapplyps levelsubsetsubsetnormalize4tau(i=0~3)smoothcoarsefine qerrordecolddata36Experimental ResultsQuantization error:37qerrfine coarsesmgcldicccldscaleexpcldexperror37Experimental Resultswaveforms(a) fine quantized(b) coarse quantized(c) coarse quantized and smoothed

38modulefine qcoare qsmoothinga bbqqpsmgfine q38Experimental ResultsBitrate variations:

Complexity:Test sequence12345Bitrate change %(cf. coarse quantized)0.510.550.690.640.53Bitrate change %(cf. fine quantized)-11.53-7.37-7.03-10.93-11.0039Bitratetoolcoarse qcoarse q1%toolsyntaxbit per framefine q10%

complexity0.4%fb

37:5039OutlineMPEG Surround IntroductionProposed Modules and Experimental ResultsConclusions and Future WorkDemo

40future work40ConclusionsDesign and implementation of four encoding modules in MPEG Surround encoderExploit correlation along time axis and frequency axisBitrate reduction: 25~55%Theoretical EstimationAdaptive time resolution and parameter smoothing41spectoolencdecisioncoding modepbstridefreqredundancyqualitybitrate25~50%bitratetime resolutionsmoothingtool41Future WorkModify error measures Different band weightsDifferent parameter weightsFind a more precise evaluation of quality for fine-tuningSome other toolsResidual coding, temporal shapingetc42error measurebandbanderror measure

thresholdqualitythreshold

mpstool42OutlineMPEG Surround IntroductionProposed Modules and Experimental ResultsConclusions and Future WorkDemo

4343Demo44Appendix45Experimental Resultswaveforms

46

ainputbps=1c?decpsps

46Filter Banks2 stages

47encdecfbanalysis filter2stagestageuniform 64-bandQMF fbfbSBRlow frequencyresolution3QMF bankfilteringdelay0QMF band6sub-subband; 1,2QMF band2sub-subbnad71bands

47OTT BoxSynthesize by a mono downmix with parameters

48

mono downmix XsXsenergyX1 X2cldX1X2iccX1 X2decorrelatorXdcommon rotation angle beta? XdupmixXd0betaupmix2

48R-TTT Box(2/2)Prediction mode:2 CPCs and 1 ICC:

where

Energy Mode:2 CLDs:

49?pred mode2cpc(channel prediction coefficient)2 icciccpred errorenergy mode2cld3chenergy ratio49TTT BoxPrediction Mode:With residual signal-> 2 CPCsWithout residual signal-> use the ICC to compensate the energy loss

Energy Mode:Energy reconstruction

50decoderPred moderes sig2cpcxd33input res sigiccresEnergy mode2cld50Experimental ResultsComparisons:5150~60consistent51Experimental Results52Pbstride60~70%consistent

52bitrate reduction % without any error53dm0_xxxDataModeps1ps2ps4theo_ps1exp_ps1theo_ps2exp_ps2theo_ps4exp_ps4Input0111.45 3.58 11.15 2.49 11.14 2.34 Input0240.60 24.03 39.56 20.63 38.36 19.81 Input0314.23 5.49 13.28 3.76 13.12 3.50 Input0411.45 3.64 11.14 2.52 11.12 2.28 Input0512.03 4.10 11.46 2.71 11.39 2.54 dmx_000DataModeps1ps2ps4theo_ps1exp_ps1theo_ps2exp_ps2theo_ps4exp_ps4Input0110.57 1.03 9.59 0.39 9.13 0.40 Input0233.67 10.54 32.68 10.76 32.11 10.58 Input0313.88 2.39 12.00 1.32 11.13 1.21 Input049.47 0.66 9.32 0.47 9.07 0.40 Input0510.10 1.06 9.41 0.46 8.92 0.43 53Reference Software EncoderParameter set=1Parameter band=20Tree structure: 5151, 5152, 525Time slots: 16, 32Fine quantizationDifferential in T/F, PCM + 1D Huffman

54CLDICC1235DT distributions55Prediction Mode of R-TTT Box2 ways to decoding:With residual signal:Without residual signal: use ICC to compensate energy loss How to decide appropriate CPCs and ICC?

56prediction modecpctttdec2residualresidualreconstruction erroriccresidlossclddeterminatecpcspecresidualcpcicc?56Prediction Mode of R-TTT Box

57Eq1residualinput sig1eq2residdecicceq3sig2eq42icc1checkiccicc=1residualenergy057Prediction Mode of R-TTT BoxChoose CPCs to make prediction more preciseResidual energy ->0 good predictionNot verified yet since the coder is not considered58estimation errorenergy0encodercpc? criterioncpc

moduleprediction energy modedepend oncoder

58coding efficiency and quality improvement for mpeg surorund encodingJun. 2, 2008Student: Shang-Yu Yeh ()Advisor: Dr. Hsueh-Ming Hang ()

5959T/F Transform

T/F Transform

T/F Transform

Downmix

SpatialParameterEstimation

AudioEncoder

CompressedAudioBitstream

Spatial Parameters

F/T Transform

F/T Transform

MPEG Surround Encoder

CompressedAudioBitstream

AudioDecoder

SurroundSynthesis

Spatial Parameters

Legacy Decoding

F/T Transform

F/T Transform

F/T Transform

T/F Transform

T/F Transform

MPEG Surround Decoder

O

A

B