tensilica sogggggggund

Upload: hernan-charca

Post on 02-Jun-2018

223 views

Category:

Documents


1 download

TRANSCRIPT

  • 8/10/2019 Tensilica Sogggggggund

    1/12

    WHITE PAPER

    Tensilica White Paper October 25, 2010 www.tensilica.com

    Put Low-Power, Low-Overhead,High-Fidelity Digital Sound in

    Your Next ASIC or SOC

    High-quality audio creates an immersive experience that excites buyers and spurs

    the purchase of consumer products such as home theater and PC sound systems, flat-

    panel televisions, handheld and console video games, portable music and video

    players, and mobile telephone handsets. As a result, digital audio has rocketed to the

    top of the critical features list for all sorts of products over the past several years. At

    the same time, the number of digital audio codecs (coders and decoders) and audio-

    enhancement programs has exploded and most consumer products must support

    multiple codecs and offer a broad range of audio-enhancement features.

    All of these factors have resulted in a high demand for a flexible, high-performance,

    low-power audio DSPs that add sound to an SOCs design with the least amount of

    design effort and a small on-chip footprint. Tensilicas HiFi Audio DSP family was

    carefully crafted to meet these requirements for the broadest possible range of

    consumer products. The HiFi 2 and HiFi EP Audio DSPs are available as an audio-

    extension packages for the Xtensa LX configurable processor core and the HiFi 2

    DSP is incorporated into the pre-configured 330HiFi processor core.

    Why use programmable SOC hardware for audio?

    The ASICs developed for the original digital music players did not useprogrammable MP3 decoders. Instead, they used dedicated hardware because that

    approach resulted in the lowest gate count. However, for products that must handle

    multiple digital audio codecs (and just play MP3 files), the need for multiple

    hardware codecs quickly increases gate count, making the use of hardware audiocodecs quickly lose its advantages. For products that must support two or more

    digital audio file formats, a programmable approach is far more attractive.

    Consequently, nearly all new ASIC and SOC designs incorporating digital audioemploy some form of programmable DSP to run the audio codecs.

  • 8/10/2019 Tensilica Sogggggggund

    2/12

    Page 2

    Tensilica White Paper October 25, 2010 www.tensilica.com

    In addition to supporting multiple audio codecs, the programmable design approachactually reduces design risk because future changes in one of the existing codecstandards or in the products definition can be accommodated through a firmware

    change instead of a hardware change, which would require a spin of the silicon.

    Also, the programmable nature of a processor-based approach makes it far easier to

    add application-specific pre- and post-processing to the end product. Such featuresinclude sample-rate conversion, multi-band frequency equalization, speaker

    virtualization, and parsing (audio content extraction from a file using various

    container formats).

    Why not just use a RISC processor or DSP for audio?

    General-purpose processors are often used to implement audio codecs because these

    processors can do anything that can be expressed in a firmware program, given

    enough execution cycles. With a sufficiently fast clock, general-purpose processors

    can implement multiple audio codec algorithms and they are therefore suitable for

    multipurpose devices that must accommodate several digital-audio standards.However, general-purpose processors are not especially efficient engines for running

    audio codecs, which means that they must run at higher clock rates to deliver real-

    time, multichannel audio. In turn, the higher clock rates cause these processors to

    consume more energy than do processors with audio-specific instruction sets. Higherpower dissipation presents a real problem for portable, battery-powered devices.

    DSPs generally run audio codecs more efficiently than do general-purpose

    processors because they have features that accelerate the execution of audio-specific

    code such as hardware MACs. However, DSPs have specialized and irregular

    architectures that make them poor targets for C compilers and, therefore, DSPs arenot attractive as control processors. Consequently, DSPs are usually paired with ageneral-purpose control processor to create a design platform with sufficient

    capability and flexibility to be used for todays consumer products.

    Dual-processor configurations that split the audio-processing duties across more thanone processor or a processor and a hardware accelerator block create their own

    design problems because the audio-processing tasks must be apportioned to each

    block. These audio operations running on each of two processors or a processor anda hardware accelerator must be coordinated, which increases software-development

    complexity because the two different processors (usually a general-purpose RISC

    processor and a DSP) must be programmed with dissimilar software-development

    tools. Use of different programming tools further complicates software development.

    System developers implementing audio codecs and other audio-processing software

    for ASICs and SOCs have another design alternativecalled application-tailored

    RISC processorswhich start out as general-purpose processors capable of runningany software program. Processor tailoring can add audio-specific extensions to the

    basic RISC processor architecture including new instructions, registers, and data

    types. These audio-specific extensions accelerate code execution for the target audioapplications (codecs and sound-shaping algorithms) while allowing the processor to

    remain a good C compiler target.

  • 8/10/2019 Tensilica Sogggggggund

    3/12

    Page 3

    Tensilica White Paper October 25, 2010 www.tensilica.com

    The HiFi 2 and HiFi EP Audio DSPs

    Tensilica offers two HiFi audio DSP options for the Xtensa LX customizable

    processor. Just click the right configuration option box, and youll add audio to your

    design. You can then customize other processor functions such as memories,interfaces, etc., as well as model your chip design and profile your software (more

    about that later).

    HIFi 2 is Tensilicas workhorse audio DSP. It is suited for most applications. HiFiEP is a superset of HiFi 2 with advanced optimizations for DTS Master Audio,

    improved voice pre-and-post processing, and an improved cache memory subsystem.

    Because HiFi EP is a superset of HiFi 2, in this white paper when we talk about HiFi

    were talking about both DSP options as well as the 330HiFi pre-configured core.

    Well be careful to call out the distinctive HiFi EP features where applicable.

    The HIFi DSPs are options for Tensilicas Xtensa LX processor. This gives them asignificant advantage over other DSPs on the market. The HiFi DSPs add processor

    extensions that deliver better audio performance by exploiting the added parallelism

    delivered by the Xtensa LX architecture through flexible-length instruction

    extensions (FLIX) and by the associated, XCC vectorizing C/C++ complier, whichautomatically exploits the added parallelism made available through the FLIX

    hardware extensions. FLIX instruction extensions allow processors based on theXtensa LX architecture to issue multiple independent operations each clock cycle.

    Like all Xtensa instructions, FLIX instructions are first class citizens of the

    processors instruction set and are recognized and used by all of the processors

    software-development tools including its C/C++ compiler, assembler, debugger, andinstruction-set simulator.

    Tensilicas HiFi extensions comprise a set of more than 300 new audio-specific

    operations, grouped as shown in Table 1.

    HiFi 2 Operations1 Loads and Stores

    2 Single/Dual Multiply with 56-bit Accumulator

    3 Scalar and 2-Way SIMD ALU Operations

    4 Variable/Immediate Shifts

    5 Convert/Round/Truncate/Saturate Operations

    6 Huffman Encode/Decode and Bit-Stream Support

    Table 1: The Five Operation Groups of the HiFi Audio DSPs

    These HiFi instructions are teamed with two audio-specific register files: an 8-entry

    file named P with 48 bits/entry (each entry can hold two 24-bit audio values) and a4-entry file named Q with 56-bit entries. The 56-bit values in the Q register file are

    generated from a set of instructions that control a dual multiplier/accumulator

  • 8/10/2019 Tensilica Sogggggggund

    4/12

    Page 4

    Tensilica White Paper October 25, 2010 www.tensilica.com

    (MAC). Each of the two pipelined multipliers can perform a 24x24-bit or a 32x16-bitmultiplication with a throughput of one multiplication per multiplier per cycle. The56-bit results obtained from the two multipliers are accumulated in the Q registers.

    (Note: the 32x16-bit operation mode for the multipliers is particularly helpful for

    high-precision arithmetic.)

    Group 1 includes operations that load data from and store data to the P andQ register files. These operations support immediate and indexed

    addressing modes with and without automatic address register updating.

    Group 2 instructions drive the HiFi 2 Audio Engines dual MAC. Audiocodec software uses these instructions to perform audio-stream transforms

    between the time and frequency domains and for windowing and frequency-

    band splitting, sample-rate conversion, and special audio effects such asreverb and three-dimensional sound simulation.

    Group 3 operations perform scalar arithmetic and Boolean functions on 56-bit words stored in the Q register file as well as 2-way SIMD arithmetic and

    Boolean functions on the 48-bit (paired 24-bit) words stored in the P

    register file.

    Group 4 instructions include shift operations used for normalization, whichmaximizes the dynamic range of the HiFi 2 Audio Engines fixed-point

    calculations.

    Group 5 operations include conversion, rounding, truncation, and saturationfunctions for the P and Q register files. Operations in this group also permitdata exchanges between the P and Q register files and the processors base

    32-bit register file.

    Group 6 includes Huffman encoding and decoding operations (used fornoiseless coding and decoding) and bit-stream support operations (used forefficient bit stream packing and unpacking).

    HiFi EP adds additional instructions for further optimizations in themultiply/accumulate (group 2), arithmetic (group 3) and shift (group 4) operations.

    Very Un-RISC-Like but a Good Compiler Target

    Adding 300+ audio-specific instructions to a RISC processor makes the compiledtarget application code very efficientwhich was a primary goal in the design of the

    HiFi Audio DSPbut it is a very un-RISC-like concept and it would be

    prohibitively expensive, essentially impractical, to perform such application-specificprocessor tailoring without using an automated tool like Tensilicas Xtensa Processor

    Generator for constructing the processor cores and associated software-development

    tools.

  • 8/10/2019 Tensilica Sogggggggund

    5/12

    Page 5

    Tensilica White Paper October 25, 2010 www.tensilica.com

    A detailed examination of all of the HiFi operations in the five operation groups isavailable in the technical documentation but a quick look at the HiFi MAC operationgroup illustrates the flexibility of the Xtensa LX architectures FLIX-format

    instructions. Figure 1 illustrates the HiFi instruction format.

    63 27 26 0

    Operation 1 Operation 0 64-bit

    Operation 24-bit

    Operation 16-bit

    Slot 1 Slot 0

    Figure 1: The HiFi Instruc tion Formats

    The MAC instructions can include the following functions and very complex

    instructions can be built from these MAC primitives:

    Single or dual multiplication

    Fractional or integer arithmetic

    24x24-bit, 16x16-bit, or 32x16-bit operands

    Overwrite, add, or subtract accumulation with or without saturation

    Signed or unsigned arithmetic

    The base Xtensa LX RISC instructions use operation Slot 0 and are either 16 or 24bits wide. The HiFi instructions generally use Slot 0 for its load and store operations

    and for a few audio-processing DSP instructions that access to the processors base

    32-bit register file. HiFi uses Slot 1 for most DSP operations.

    Xtensa LX processors with a HiFi option and the 330HiFi core modelessly handle

    variable-width instruction streams. This feature allows the associated C/C++

    compilers code generator to select and freely intermix various instruction sizes to

    minimize the size of the compiled code. The compiler automatically selects thesmallest instruction that will perform the required operation, which results in very

    compact code and avoids the code bloat generally associated with multi-operation

    VLIW processors.

    Figure 2 provides a simplified view the HiFi register files and data path, organizedvisually to match the 2-slot FLIX instruction format shown in Figure 1. The 16- and

    24-bit instructions share the base RISC processors execution hardware (registersand pipeline) with operations in the lower operation slot (called Slot 0). The rest of

    the datapath extensions occupy Slot 1.

  • 8/10/2019 Tensilica Sogggggggund

    6/12

    Page 6

    Tensilica White Paper October 25, 2010 www.tensilica.com

    Figure 2: HiFi Register Files and Datapath

    The base RISC processor instructions and the HiFi Audio DSPs 16- and 24-bit

    instructions only control resources assigned to Slot 0 while the 64-bit HIFiinstructions can control all the processor resources in both slots. For 64-bit

    instructions, the operation residing in the lower part of the 64-bit instruction wordcontrols processor resources assigned to Slot 0 and the operation in the upper part ofthe instruction word (labeled Operation 1 in Figure 1) controls resources assigned to

    Slot 1. Its important to note that there are two operation slots but there is only one

    processor, which keeps the programming model simple and allows the audio-

    enhanced processor to be a good target for the C/C++ compiler.

    HiFi 2 or HiFi EP?

    HiFi EP is a superset of HiFi 2 with advanced optimizations for DTS Master Audio,

    improved voice pre- & post-processing, and improved cache memory subsystem.

    HiFi EP includes a novel and unique 32x24 MAC for higher performance atlower power on the popular DTS Master Audio lossless decoder, resulting in a clockrate reduction of almost 35% compared to HiFi 2.

    To address the increasingly demanding requirements in mobile and VoIPapplications for better immunity to background noise and speakerphone mode

    quality, new instructions have been added to accelerate voice pre- and post-

    processing for noise cancellation and beam forming microphones. Theseinstructions also provide better general DSP capabilities.

  • 8/10/2019 Tensilica Sogggggggund

    7/12

    Page 7

    Tensilica White Paper October 25, 2010 www.tensilica.com

    The cache memory subsystem is enhanced with an integrated predictive prefetchunit to significantly improve performance in SOC designs with large external

    memory latencies.

    Experiments Guided HiFi Audio DSP Development

    The HiFi designers experimented with various extensions and performance resultswith different software codecs guided architectural development. Because the Xtensa

    Processor Generator produces a tailored software-development tool suite for each

    new set of extensions in minutes, the designers fond it was relatively easy to playwhat if games with the target application code and to objectively assess various

    extension alternatives along several simultaneous design dimensions.

    For example, designers considered several alternative designs for the HiFi AudioDSPs MAC before finalizing the design. The multipliers could have been designed

    to be capable of 24x24- or 32x16-bit operations and the final MAC unit could have

    been designed with one or two multipliers. A configuration allowing two 32x16-bit

    multipliers to be used as one 32x32-bit multiplier was also considered. Experiments

    with different MAC configurations produced the results shown in the Table below.

    Maximum Clock

    Rate (MHz)

    Gate

    Count

    Area

    (mm2)

    Single 24x24-bit MAC 299 88,569 0.98

    Dual 24x24-bit MAC 289 100,860 1.12

    Dual MAC supporting 24x24-

    and 32x16-bit operations284 101, 408 1.13

    Dual MAC supporting 24x24-,

    32x16-bit operations or a single

    32x32-bit operation

    270 110,012 1.22

    Based on TSMC 130nm LV process, Artisan library, includes MUL32 Xtensa LX configuration

    option not used in the final HiFi design

    Table 2: HiFi MAC options and experimental results

    Based on these results, the HiFi designers selected a dual-MAC configuration capable

    of 24x24-bit and 32x16-bit operations because this configuration provided the bestaudio-codec performance (the MAC instructions are especially good for complex

    multiplications and for FIR filters) without seriously compromising the synthesized

    processor cores maximum clock rate. Note that this maximum clock rate is manytimes higher than the actual clock rate needed to run any one of the software audio

    codecs. This processing headroom allows the HiFi Audio DSPs to handle several

    other audio and control tasks on the SOC while also running one or more audiocodecs concurrently.

  • 8/10/2019 Tensilica Sogggggggund

    8/12

    Page 8

    Tensilica White Paper October 25, 2010 www.tensilica.com

    Industrys Broadest Line of Audio Codecs

    Most consumer products being designed today need to run more than one audio

    codec (although not necessarily at the same time). Table 3 lists some of the proven

    audio codecs and other audio packages available for Xtensa processor cores with the

    HiFi 2 or EP Audio DSPs and the 330HiFi Audio DSP. Were adding more all thetime.

    Aud io and Speech Packages for the HiFi Aud io DSPs

    AM3D

    AMR Wideband Decoder and Encoder

    DAB/MP2 Decoder

    DAB+ Decoder

    Dolby Digital AC-3 Decoder, 5.1 channelDolby Digital AC-3 Consumer Encoder 2, 5.1 ch

    Dolby Digital Compatible Output Encoder, 5.1 ch

    Dolby Digital Plus 5.1 ch Decoder/Converter

    Dolby Digital Plus 7.1 ch Decoder

    Dolby Prologic II Decoder

    Dolby MS10

    Dolby Digital TrueHD Decoder

    DTS Decoder

    MP3 Decoder and Encoder

    MPEG-4 aacPlus v2 Decoder,2, 7.1 ch

    MPEG-4 aacPlus v2 Encoder

    MPEG-4 aacPlus v1 Decoder 2, 7.1 ch

    MPEG-4 aacPlus v1 Encoder

    MPEG-2/4 AAC LC Decoder 2, 7.1 ch

    MPEG-4 AAC LC Encoder

    MPEG-4 BSAC Encoder

    Ogg Vorbis Decoder

    QSound MicroQ

    RealAudio 8, 9 and 10 Decoder

    SPIRIT DSP Audio and Voice Codecs

    SRS WOW XT, Xspace 3D and TruSound HD

    WMA Decoder

    WMA Encoder

    AMR Narrowband Speech Codec

    AMR Wideband Speech Codec

    G.729AB Speech Codec

    SPIRIT DSP Voice Codecs

    Table 3: Some of the audio packages available forTensilicas HiFi Audio DSPs

    The long list of audio software packages in Table 3 already leads the industry andTensilica is committed to keeping this lead through the introduction of new codecs

    and other audio application software when needed.

  • 8/10/2019 Tensilica Sogggggggund

    9/12

    Page 9

    Tensilica White Paper October 25, 2010 www.tensilica.com

    Because each of these audio packages requires very little processing bandwidth, HiFican run these audio codecs at very low clock rates to save power. Alternatively, the

    processor can run several of these audio packages simultaneously at somewhat

    higher clock rates.

    All of the codecs listed in Table 3 are written in C. In fact, a primary HiFi design

    goal was to create an audio platform that could be programmed in C while delivering

    the desired real-time performance at low processor clock rates. This approach opens

    software development to a much larger programming audience than for othervendors audio solutions (which must be programmed in assembly language to

    achieve performance goals). Because many more programmers are familiar with C

    than with assembly-language coding, SOC design teams can draw on the much largerbase of C/C++ programmers by using HiFi.

    HiFi Audios Extensive Low Power Features

    Through a combination of significantly lower per-MHz power consumption and

    architectural optimization of the instruction set, HiFi delivers dramatic

    improvements in energy efficiency that result in increased battery life (which results

    in more playing time) for portable and wireless applications. Based on TSMCs

    65nm LP process and a minimal HiFi 2 configuration including memory, dynamicand static power dissipation can be as low as 66 W/MHz and 69 W respectively.

    Total power dissipation is only 0.45 mW while decoding a typical MP3 file at 5.7MHz.

    Tensilicas Xtensa LX configurable processor core and Tensilicas 330HiFi AudioDSP, which is based on the Xtensa LX core, have many features that facilitate low-

    power operation including functional clock gating and a variety of power-down and

    sleep modes. Lower power and energy consumption is a key reason for using

    configurable processor cores and instruction-set extensions to run audio codecs. By

    adding appropriately tailored instructions to the processors ISA (instruction-set

    architecture), the configured audio processor executes the target application code inmany fewer cycles. As a result, the processor core can execute the codec at a greatly

    reduced clock frequency, which in turn cuts both power dissipation and energy

    consumption.

    Xtensa processor cores including the 330HiFi Audio DSP have two levels of clock

    gating. The first level of clock gating is based on global conditions. For instance, the

    WAITI instruction allows an Xtensa processor to enter a sleep mode that turns offthe clocks to nearly all of the processors internal registers. An interrupt wakes the

    processor from sleep mode. In addition, the processors RunStall signal can still be

    used to save power by allowing external logic to stall the processor pipeline and turn

    off the clock to many of the processors registers.

    The processors second level of clock gating is functional clock gating. Xtensa

    processor cores including the 330HiFi Audio Engine contain hundreds of functionalblocks, identified through trillions of simulation cycles exercising all of the

  • 8/10/2019 Tensilica Sogggggggund

    10/12

  • 8/10/2019 Tensilica Sogggggggund

    11/12

    Page 11

    Tensilica White Paper October 25, 2010 www.tensilica.com

    to the systems output DACs via an output queue. Because of this, FIFO-queueinterfaces are good I/O choices in system designs that incorporate audio because thequeue interfaces separate the continuous flow of audio data from other bus traffic,

    freeing valuable system-bus bandwidth at a very low hardware cost.

    Conclusion

    Most ASIC and SOC design teams working with on-chip audio simply want to add

    audio as a drop-in component. Audio is certainly one of the SOCs important

    features but design teams generally need to add value by spending development timeand resource on other product-specific features. Digital audio has become

    sufficiently standardized so that it can now be added to the ASIC or SOC design as

    an off-the-shelf component.

    However, the only way to add audio as a component is to select a complete, ready-

    to-use audio solution. There are many characteristics used to measure the

    completeness of an audio solution. Tensilicas HiFi Audio DSPs and the 330HiFi

    audio processor core are complete solutions that offer:

    The industrys broadest range of audio and voice codecs. Tensilica is

    committed to continually adding the latest audio and speech codecs neededby ASIC and SOC design teams.

    Low-power operation. The HiFi architecture allows existing and future

    codecs to operate at low clock rates and therefore at low power and withreduced energy consumption. Some competing audio solutions are

    optimized for one specific codec, say MP3, but perform poorly when

    running other audio codecs. Tensilicas HiFi Audio DSPs with their 300+audio instructions and audio-specific register files are designed to run allaudio and speech codecs efficiently.

    Easy design in. Many factors determine how easy it will be to design a

    vendors audio solution into an ASIC or SOC. Among these are theflexibility of the core, the interface flexibility of the audio block, simulation

    support, and programming support.

    Tensilicas hybrid RISC/audio-DSP approach provides more flexibility than

    any other in the industry. A 330HiFi processor core or a configurableXtensa LX core with a HiFi 2 otion can run Tensilicas long list of

    supported audio and speech codecs as well as new codecs and, critically,

    any other compiled C/C++ program. There simply is no more flexible

    approach.

    All Tensilica processor cores offer a variety of interface options including

    conventional buses and FIFO-queue interfaces, which are unique in theindustry. FIFO queue interfaces are extremely efficient relative to bus-based

    interfaces and never suffer from contention (as do shared buses), which

    makes FIFO-queue interfaces ideal for SOC audio applications.

  • 8/10/2019 Tensilica Sogggggggund

    12/12

    Page 12

    Tensilica White Paper October 25, 2010 www.tensilica.com

    All Tensilica processor cores incorporating HiFi Audio DSP extensionscome with an instruction-set simulator (ISS) that works in variety of

    simulation environments including SystemC. In addition, Tensilica offers

    TurboXim, a fast functional ISS that is 40-80x faster than a conventionalISS.

    All Tensilica processor cores are accompanied by a comprehensive suite ofprogramming and debugging tools carefully tailored to the processor. XCC,Tensilicas vectorizing C/C++ compiler, understands all of the parallelism

    built into the HiFi architecture including the dual MACs, so the compilers

    code generator produces very compact, highly efficient code that achieves

    performance goals at low processor clock rates. As a result, audio codecs

    and other audio software can be completely developed in C, which opens

    the coding task to a wider range of programmers and permits more rapidsoftware development.

    Note:If you would like help adding digital audio to your next ASIC or SOC design,

    contact Tensilica for a consultation. You might find our Audio Reference Designapplication note valuable. See it on our web site at www.tensilica.com.