tensilica sogggggggund

8/10/2019 Tensilica Sogggggggund

1/12

WHITE PAPER

Tensilica White Paper October 25, 2010 www.tensilica.com

Put Low-Power, Low-Overhead,High-Fidelity Digital Sound in

Your Next ASIC or SOC

High-quality audio creates an immersive experience that excites buyers and spurs

the purchase of consumer products such as home theater and PC sound systems, flat-

panel televisions, handheld and console video games, portable music and video

players, and mobile telephone handsets. As a result, digital audio has rocketed to the

top of the critical features list for all sorts of products over the past several years. At

the same time, the number of digital audio codecs (coders and decoders) and audio-

enhancement programs has exploded and most consumer products must support

multiple codecs and offer a broad range of audio-enhancement features.

All of these factors have resulted in a high demand for a flexible, high-performance,

low-power audio DSPs that add sound to an SOCs design with the least amount of

design effort and a small on-chip footprint. Tensilicas HiFi Audio DSP family was

carefully crafted to meet these requirements for the broadest possible range of

consumer products. The HiFi 2 and HiFi EP Audio DSPs are available as an audio-

extension packages for the Xtensa LX configurable processor core and the HiFi 2

DSP is incorporated into the pre-configured 330HiFi processor core.

Why use programmable SOC hardware for audio?

The ASICs developed for the original digital music players did not useprogrammable MP3 decoders. Instead, they used dedicated hardware because that

approach resulted in the lowest gate count. However, for products that must handle

multiple digital audio codecs (and just play MP3 files), the need for multiple

hardware codecs quickly increases gate count, making the use of hardware audiocodecs quickly lose its advantages. For products that must support two or more

digital audio file formats, a programmable approach is far more attractive.

Consequently, nearly all new ASIC and SOC designs incorporating digital audioemploy some form of programmable DSP to run the audio codecs.


2/12

Page 2


In addition to supporting multiple audio codecs, the programmable design approachactually reduces design risk because future changes in one of the existing codecstandards or in the products definition can be accommodated through a firmware

change instead of a hardware change, which would require a spin of the silicon.

Also, the programmable nature of a processor-based approach makes it far easier to

add application-specific pre- and post-processing to the end product. Such featuresinclude sample-rate conversion, multi-band frequency equalization, speaker

virtualization, and parsing (audio content extraction from a file using various

container formats).

Why not just use a RISC processor or DSP for audio?

General-purpose processors are often used to implement audio codecs because these

processors can do anything that can be expressed in a firmware program, given

enough execution cycles. With a sufficiently fast clock, general-purpose processors

can implement multiple audio codec algorithms and they are therefore suitable for

multipurpose devices that must accommodate several digital-audio standards.However, general-purpose processors are not especially efficient engines for running

audio codecs, which means that they must run at higher clock rates to deliver real-

time, multichannel audio. In turn, the higher clock rates cause these processors to

consume more energy than do processors with audio-specific instruction sets. Higherpower dissipation presents a real problem for portable, battery-powered devices.

DSPs generally run audio codecs more efficiently than do general-purpose

processors because they have features that accelerate the execution of audio-specific

code such as hardware MACs. However, DSPs have specialized and irregular

architectures that make them poor targets for C compilers and, therefore, DSPs arenot attractive as control processors. Consequently, DSPs are usually paired with ageneral-purpose control processor to create a design platform with sufficient

capability and flexibility to be used for todays consumer products.

Dual-processor configurations that split the audio-processing duties across more thanone processor or a processor and a hardware accelerator block create their own

design problems because the audio-processing tasks must be apportioned to each

block. These audio operations running on each of two processors or a processor anda hardware accelerator must be coordinated, which increases software-development

complexity because the two different processors (usually a general-purpose RISC

processor and a DSP) must be programmed with dissimilar software-development

tools. Use of different programming tools further complicates software development.

System developers implementing audio codecs and other audio-processing software

for ASICs and SOCs have another design alternativecalled application-tailored

RISC processorswhich start out as general-purpose processors capable of runningany software program. Processor tailoring can add audio-specific extensions to the

basic RISC processor architecture including new instructions, registers, and data

types. These audio-specific extensions accelerate code execution for the target audioapplications (codecs and sound-shaping algorithms) while allowing the processor to

remain a good C compiler target.


3/12

Page 3


The HiFi 2 and HiFi EP Audio DSPs

Tensilica offers two HiFi audio DSP options for the Xtensa LX customizable

processor. Just click the right configuration option box, and youll add audio to your

design. You can then customize other processor functions such as memories,interfaces, etc., as well as model your chip design and profile your software (more

about that later).

HIFi 2 is Tensilicas workhorse audio DSP. It is suited for most applications. HiFiEP is a superset of HiFi 2 with advanced optimizations for DTS Master Audio,

improved voice pre-and-post processing, and an improved cache memory subsystem.

Because HiFi EP is a superset of HiFi 2, in this white paper when we talk about HiFi

were talking about both DSP options as well as the 330HiFi pre-configured core.

Well be careful to call out the distinctive HiFi EP features where applicable.

The HIFi DSPs are options for Tensilicas Xtensa LX processor. This gives them asignificant advantage over other DSPs on the market. The HiFi DSPs add processor

extensions that deliver better audio performance by exploiting the added parallelism

delivered by the Xtensa LX architecture through flexible-length instruction

extensions (FLIX) and by the associated, XCC vectorizing C/C++ complier, whichautomatically exploits the added parallelism made available through the FLIX

hardware extensions. FLIX instruction extensions allow processors based on theXtensa LX architecture to issue multiple independent operations each clock cycle.

Like all Xtensa instructions, FLIX instructions are first class citizens of the

processors instruction set and are recognized and used by all of the processors

software-development tools including its C/C++ compiler, assembler, debugger, andinstruction-set simulator.

Tensilicas HiFi extensions comprise a set of more than 300 new audio-specific

operations, grouped as shown in Table 1.

HiFi 2 Operations1 Loads and Stores

2 Single/Dual Multiply with 56-bit Accumulator

3 Scalar and 2-Way SIMD ALU Operations

4 Variable/Immediate Shifts

5 Convert/Round/Truncate/Saturate Operations

6 Huffman Encode/Decode and Bit-Stream Support

Table 1: The Five Operation Groups of the HiFi Audio DSPs

These HiFi instructions are teamed with two audio-specific register files: an 8-entry

file named P with 48 bits/entry (each entry can hold two 24-bit audio values) and a4-entry file named Q with 56-bit entries. The 56-bit values in the Q register file are

generated from a set of instructions that control a dual multiplier/accumulator


4/12

Page 4


(MAC). Each of the two pipelined multipliers can perform a 24x24-bit or a 32x16-bitmultiplication with a throughput of one multiplication per multiplier per cycle. The56-bit results obtained from the two multipliers are accumulated in the Q registers.

(Note: the 32x16-bit operation mode for the multipliers is particularly helpful for

high-precision arithmetic.)

Group 1 includes operations that load data from and store data to the P andQ register files. These operations support immediate and indexed

addressing modes with and without automatic address register updating.

Group 2 instructions drive the HiFi 2 Audio Engines dual MAC. Audiocodec software uses these instructions to perform audio-stream transforms

between the time and frequency domains and for windowing and frequency-

band splitting, sample-rate conversion, and special audio effects such asreverb and three-dimensional sound simulation.

Group 3 operations perform scalar arithmetic and Boolean functions on 56-bit words stored in the Q register file as well as 2-way SIMD arithmetic and

Boolean functions on the 48-bit (paired 24-bit) words stored in the P

register file.

Group 4 instructions include shift operations used for normalization, whichmaximizes the dynamic range of the HiFi 2 Audio Engines fixed-point

calculations.

Group 5 operations include conversion, rounding, truncation, and saturationfunctions for the P and Q register files. Operations in this group also permitdata exchanges between the P and Q register files and the processors base

32-bit register file.

Group 6 includes Huffman encoding and decoding operations (used fornoiseless coding and decoding) and bit-stream support operations (used forefficient bit stream packing and unpacking).

HiFi EP adds additional instructions for further optimizations in themultiply/accumulate (group 2), arithmetic (group 3) and shift (group 4) operations.

Very Un-RISC-Like but a Good Compiler Target

Adding 300+ audio-specific instructions to a RISC processor makes the compiledtarget application code very efficientwhich was a primary goal in the design of the

HiFi Audio DSPbut it is a very un-RISC-like concept and it would be

prohibitively expensive, essentially impractical, to perform such application-specificprocessor tailoring without using an automated tool like Tensilicas Xtensa Processor

Generator for constructing the processor cores and associated software-development

tools.


5/12

Page 5


A detailed examination of all of the HiFi operations in the five operation groups isavailable in the technical documentation but a quick look at the HiFi MAC operationgroup illustrates the flexibility of the Xtensa LX architectures FLIX-format

instructions. Figure 1 illustrates the HiFi instruction format.

63 27 26 0

Operation 1 Operation 0 64-bit

Operation 24-bit

Operation 16-bit

Slot 1 Slot 0

Figure 1: The HiFi Instruc tion Formats

The MAC instructions can include the following functions and very complex

instructions can be built from these MAC primitives:

Single or dual multiplication

Fractional or integer arithmetic

24x24-bit, 16x16-bit, or 32x16-bit operands

Overwrite, add, or subtract accumulation with or without saturation

Signed or unsigned arithmetic

The base Xtensa LX RISC instructions use operation Slot 0 and are either 16 or 24bits wide. The HiFi instructions generally use Slot 0 for its load and store operations

and for a few audio-processing DSP instructions that access to the processors base

32-bit register file. HiFi uses Slot 1 for most DSP operations.

Xtensa LX processors with a HiFi option and the 330HiFi core modelessly handle

variable-width instruction streams. This feature allows the associated C/C++

compilers code generator to select and freely intermix various instruction sizes to

minimize the size of the compiled code. The compiler automatically selects thesmallest instruction that will perform the required operation, which results in very

compact code and avoids the code bloat generally associated with multi-operation

VLIW processors.

Figure 2 provides a simplified view the HiFi register files and data path, organizedvisually to match the 2-slot FLIX instruction format shown in Figure 1. The 16- and

24-bit instructions share the base RISC processors execution hardware (registersand pipeline) with operations in the lower operation slot (called Slot 0). The rest of

the datapath extensions occupy Slot 1.


6/12

Page 6


Figure 2: HiFi Register Files and Datapath

The base RISC processor instructions and the HiFi Audio DSPs 16- and 24-bit

instructions only control resources assigned to Slot 0 while the 64-bit HIFiinstructions can control all the processor resources in both slots. For 64-bit

instructions, the operation residing in the lower part of the 64-bit instruction wordcontrols processor resources assigned to Slot 0 and the operation in the upper part ofthe instruction word (labeled Operation 1 in Figure 1) controls resources assigned to

Slot 1. Its important to note that there are two operation slots but there is only one

processor, which keeps the programming model simple and allows the audio-

enhanced processor to be a good target for the C/C++ compiler.

HiFi 2 or HiFi EP?

HiFi EP is a superset of HiFi 2 with advanced optimizations for DTS Master Audio,

improved voice pre- & post-processing, and improved cache memory subsystem.

HiFi EP includes a novel and unique 32x24 MAC for higher performance atlower power on the popular DTS Master Audio lossless decoder, resulting in a clockrate reduction of almost 35% compared to HiFi 2.

To address the increasingly demanding requirements in mobile and VoIPapplications for better immunity to background noise and speakerphone mode

quality, new instructions have been added to accelerate voice pre- and post-

processing for noise cancellation and beam forming microphones. Theseinstructions also provide better general DSP capabilities.


7/12

Page 7


The cache memory subsystem is enhanced with an integrated predictive prefetchunit to significantly improve performance in SOC designs with large external

memory latencies.

Experiments Guided HiFi Audio DSP Development

The HiFi designers experimented with various extensions and performance resultswith different software codecs guided architectural development. Because the Xtensa

Processor Generator produces a tailored software-development tool suite for each

new set of extensions in minutes, the designers fond it was relatively easy to playwhat if games with the target application code and to objectively assess various

extension alternatives along several simultaneous design dimensions.

For example, designers considered several alternative designs for the HiFi AudioDSPs MAC before finalizing the design. The multipliers could have been designed

to be capable of 24x24- or 32x16-bit operations and the final MAC unit could have

been designed with one or two multipliers. A configuration allowing two 32x16-bit

multipliers to be used as one 32x32-bit multiplier was also considered. Experiments

with different MAC configurations produced the results shown in the Table below.

Maximum Clock

Rate (MHz)

Gate

Count

Area

(mm2)

Single 24x24-bit MAC 299 88,569 0.98

Dual 24x24-bit MAC 289 100,860 1.12

Dual MAC supporting 24x24-

and 32x16-bit operations284 101, 408 1.13

Dual MAC supporting 24x24-,

32x16-bit operations or a single

32x32-bit operation

270 110,012 1.22

Based on TSMC 130nm LV process, Artisan library, includes MUL32 Xtensa LX configuration

option not used in the final HiFi design

Table 2: HiFi MAC options and experimental results

Based on these results, the HiFi designers selected a dual-MAC configuration capable

of 24x24-bit and 32x16-bit operations because this configuration provided the bestaudio-codec performance (the MAC instructions are especially good for complex

multiplications and for FIR filters) without seriously compromising the synthesized

processor cores maximum clock rate. Note that this maximum clock rate is manytimes higher than the actual clock rate needed to run any one of the software audio

codecs. This processing headroom allows the HiFi Audio DSPs to handle several

other audio and control tasks on the SOC while also running one or more audiocodecs concurrently.


8/12

Page 8


Industrys Broadest Line of Audio Codecs

Most consumer products being designed today need to run more than one audio

codec (although not necessarily at the same time). Table 3 lists some of the proven

audio codecs and other audio packages available for Xtensa processor cores with the

HiFi 2 or EP Audio DSPs and the 330HiFi Audio DSP. Were adding more all thetime.

Aud io and Speech Packages for the HiFi Aud io DSPs

AM3D

AMR Wideband Decoder and Encoder

DAB/MP2 Decoder

DAB+ Decoder

Dolby Digital AC-3 Decoder, 5.1 channelDolby Digital AC-3 Consumer Encoder 2, 5.1 ch

Dolby Digital Compatible Output Encoder, 5.1 ch

Dolby Digital Plus 5.1 ch Decoder/Converter

Dolby Digital Plus 7.1 ch Decoder

Dolby Prologic II Decoder

Dolby MS10

Dolby Digital TrueHD Decoder

DTS Decoder

MP3 Decoder and Encoder

MPEG-4 aacPlus v2 Decoder,2, 7.1 ch

MPEG-4 aacPlus v2 Encoder

MPEG-4 aacPlus v1 Decoder 2, 7.1 ch

MPEG-4 aacPlus v1 Encoder

MPEG-2/4 AAC LC Decoder 2, 7.1 ch

MPEG-4 AAC LC Encoder

MPEG-4 BSAC Encoder

Ogg Vorbis Decoder

QSound MicroQ

RealAudio 8, 9 and 10 Decoder

SPIRIT DSP Audio and Voice Codecs

SRS WOW XT, Xspace 3D and TruSound HD

WMA Decoder

WMA Encoder

AMR Narrowband Speech Codec

AMR Wideband Speech Codec

G.729AB Speech Codec

SPIRIT DSP Voice Codecs

Table 3: Some of the audio packages available forTensilicas HiFi Audio DSPs

The long list of audio software packages in Table 3 already leads the industry andTensilica is committed to keeping this lead through the introduction of new codecs

and other audio application software when needed.


9/12

Page 9


Because each of these audio packages requires very little processing bandwidth, HiFican run these audio codecs at very low clock rates to save power. Alternatively, the

processor can run several of these audio packages simultaneously at somewhat

higher clock rates.

All of the codecs listed in Table 3 are written in C. In fact, a primary HiFi design

goal was to create an audio platform that could be programmed in C while delivering

the desired real-time performance at low processor clock rates. This approach opens

software development to a much larger programming audience than for othervendors audio solutions (which must be programmed in assembly language to

achieve performance goals). Because many more programmers are familiar with C

than with assembly-language coding, SOC design teams can draw on the much largerbase of C/C++ programmers by using HiFi.

HiFi Audios Extensive Low Power Features

Through a combination of significantly lower per-MHz power consumption and

architectural optimization of the instruction set, HiFi delivers dramatic

improvements in energy efficiency that result in increased battery life (which results

in more playing time) for portable and wireless applications. Based on TSMCs

65nm LP process and a minimal HiFi 2 configuration including memory, dynamicand static power dissipation can be as low as 66 W/MHz and 69 W respectively.

Total power dissipation is only 0.45 mW while decoding a typical MP3 file at 5.7MHz.

Tensilicas Xtensa LX configurable processor core and Tensilicas 330HiFi AudioDSP, which is based on the Xtensa LX core, have many features that facilitate low-

power operation including functional clock gating and a variety of power-down and

sleep modes. Lower power and energy consumption is a key reason for using

configurable processor cores and instruction-set extensions to run audio codecs. By

adding appropriately tailored instructions to the processors ISA (instruction-set

architecture), the configured audio processor executes the target application code inmany fewer cycles. As a result, the processor core can execute the codec at a greatly

reduced clock frequency, which in turn cuts both power dissipation and energy

consumption.

Xtensa processor cores including the 330HiFi Audio DSP have two levels of clock

gating. The first level of clock gating is based on global conditions. For instance, the

WAITI instruction allows an Xtensa processor to enter a sleep mode that turns offthe clocks to nearly all of the processors internal registers. An interrupt wakes the

processor from sleep mode. In addition, the processors RunStall signal can still be

used to save power by allowing external logic to stall the processor pipeline and turn

off the clock to many of the processors registers.

The processors second level of clock gating is functional clock gating. Xtensa

processor cores including the 330HiFi Audio Engine contain hundreds of functionalblocks, identified through trillions of simulation cycles exercising all of the


10/12


11/12

Page 11


to the systems output DACs via an output queue. Because of this, FIFO-queueinterfaces are good I/O choices in system designs that incorporate audio because thequeue interfaces separate the continuous flow of audio data from other bus traffic,

freeing valuable system-bus bandwidth at a very low hardware cost.

Conclusion

Most ASIC and SOC design teams working with on-chip audio simply want to add

audio as a drop-in component. Audio is certainly one of the SOCs important

features but design teams generally need to add value by spending development timeand resource on other product-specific features. Digital audio has become

sufficiently standardized so that it can now be added to the ASIC or SOC design as

an off-the-shelf component.

However, the only way to add audio as a component is to select a complete, ready-

to-use audio solution. There are many characteristics used to measure the

completeness of an audio solution. Tensilicas HiFi Audio DSPs and the 330HiFi

audio processor core are complete solutions that offer:

The industrys broadest range of audio and voice codecs. Tensilica is

committed to continually adding the latest audio and speech codecs neededby ASIC and SOC design teams.

Low-power operation. The HiFi architecture allows existing and future

codecs to operate at low clock rates and therefore at low power and withreduced energy consumption. Some competing audio solutions are

optimized for one specific codec, say MP3, but perform poorly when

running other audio codecs. Tensilicas HiFi Audio DSPs with their 300+audio instructions and audio-specific register files are designed to run allaudio and speech codecs efficiently.

Easy design in. Many factors determine how easy it will be to design a

vendors audio solution into an ASIC or SOC. Among these are theflexibility of the core, the interface flexibility of the audio block, simulation

support, and programming support.

Tensilicas hybrid RISC/audio-DSP approach provides more flexibility than

any other in the industry. A 330HiFi processor core or a configurableXtensa LX core with a HiFi 2 otion can run Tensilicas long list of

supported audio and speech codecs as well as new codecs and, critically,

any other compiled C/C++ program. There simply is no more flexible

approach.

All Tensilica processor cores offer a variety of interface options including

conventional buses and FIFO-queue interfaces, which are unique in theindustry. FIFO queue interfaces are extremely efficient relative to bus-based

interfaces and never suffer from contention (as do shared buses), which

makes FIFO-queue interfaces ideal for SOC audio applications.


12/12

Page 12


All Tensilica processor cores incorporating HiFi Audio DSP extensionscome with an instruction-set simulator (ISS) that works in variety of

simulation environments including SystemC. In addition, Tensilica offers

TurboXim, a fast functional ISS that is 40-80x faster than a conventionalISS.

All Tensilica processor cores are accompanied by a comprehensive suite ofprogramming and debugging tools carefully tailored to the processor. XCC,Tensilicas vectorizing C/C++ compiler, understands all of the parallelism

built into the HiFi architecture including the dual MACs, so the compilers

code generator produces very compact, highly efficient code that achieves

performance goals at low processor clock rates. As a result, audio codecs

and other audio software can be completely developed in C, which opens

the coding task to a wider range of programmers and permits more rapidsoftware development.

Note:If you would like help adding digital audio to your next ASIC or SOC design,

contact Tensilica for a consultation. You might find our Audio Reference Designapplication note valuable. See it on our web site at www.tensilica.com.

tensilica sogggggggund

Documents