parthi12

7/27/2019 parthi12

1/6

Abstract This paper describes a method of converting

floating-point expressions into equivalent fixed-point code in

DSP software. Replacing floating-point expressions with

specialized integer operations can greatly improve the

performance of embedded applications. This is a new method

that is developed for Direct-Form I filters with constant

coefficients and input variables whose low/high bounds are

known. Two conflicting objectives are considered

simultaneously: computational complexity and accuracy loss.

The algorithm presented here can construct multiple fixed-

point solutions for the same floating-point code: from high-

complexity-high-accuracy to low-complexity-low-accuracy.

A so-called cost function conducts the data flow

transformation decisions. By changing the cost function

coefficients, different fixed-point forms can be obtained. The

data flow transformation takes very little time: less than 100

milliseconds for a 32-tap FIR filter. The generated fixed-point

code is tested on 8-bit (AVR ATmega), 16-bit (MSP430), and

32-bit (ARM Cortex-M3) microcontrollers. It provides, in all

cases, execution speeds better than if using floating-point code.

I. INTRODUCTIONFloating-point code is somehow inappropriate for

embedded applications. The computing capabilities of

microcontrollers are reduced in general and, in most cases,no hardware support for floating-point operations is

provided. To overcome this problem, the mathematical

function contained in the floating-point code must be

expressed with fixed-point code. Doing this manuallythat

is, rewriting by hand a floating-point function into a

sequence of integer operations can be a difficult task.

II. RELATED WORKThere has been a significant effort to develop frameworks

to automate the conversion of floating-point code to integer

code [1]-[4]. Two distinct approaches can be identified:

statistical (simulation-based) and analytical. The difference

between them is in the way the dynamic intervals of

variables are computed. A statistical method performs a

series of simulations and may require an important amount

of time. An analytical method is obligatorily based on a

concrete data model (for example, propagation rules) and

can give precise information in very short time.

In [1] is described one of the first floating-point to fixed-

point converters: AUTOSCALER for C. It is able to

optimize the number of shift operations by equalizing the

word-length of specific variables or constants. In [2] is

presented a method that performs CDFG optimizations

under accuracy constraints. It makes extensive use of the

equations representing the system. In [4], is described a

genetic algorithm employed to find the optimal trade-off

between signal quality and implementation complexity.

This paper is based on a previous work detailed in [5].

Paper [5] addresses the same task as this paper, but is

primarily focused on generating ANSI C compliant code.

III. METHOD OVERVIEWThe method presented in this paper is designed to

transform dot products with constant coefficients (floating-

point literals) and integer variables with known intervals:

N

i

iixa

0

(1)

Failing to state the correct intervals of the integer

variables can lead to erroneous results. The manipulation of

intervals [6] is central to the optimization procedure.

The following types of nodes are used to represent the

data flow:

Stand-alone nodes: nodes whose values do notdepend on other nodes. Stand-alone nodes are used to

represent constants, parameters: ai,xi, etc.

Operators: add, multiply, shift and change sign. Anoperator has one or more operands (child nodes). These can

be stand-alone nodes or other operators.

A node has an associated interval and fractional word

length (FWL). The interval represents the extreme values of

the node run-time integer (the memory or register variable).

Scaling a node is a frequent operation encountered in the

optimization process. Note: the FWL of a node is an integer

value. A scaling operation does not change the fixed-point(or real) value of a node. The node interval is always altered

together with the node FWL.

A node can be realized in code as a 16-bit or 32-bit

integer. The values that can pass through a node must

have as many as possible significant bits, to carry precise

information, but, on the other hand, must be limited to a

specific interval. The FWL is not necessarily the same for

every node.

The data flow structure is modified in steps. A step can

be viewed as an inference operation:

Floating-point to fixed-point code conversion with variable trade-off

between computational complexity and accuracy loss

Alexandru Brleanu, Vadim Bitoiu and Andrei Stan,Member, IEEE

7/27/2019 parthi12

2/6

1.Effect. The integer interval of a node must bedecreased or increased.

2.List of possible causes. A list of candidate data flowtransformations is constructed.

3.Best cause selection. The optimal data flowtransformation is selected with the help of a cost functionwhose coefficients represents in essence the importance

given to the computational effort and to the accuracy loss.

The method described in this paper is implemented in

Java (mostly because of the Java support for object-oriented

programming and advanced IDEs available).

IV. DATA FLOW TRANSFORMATIONA. Problem Difficulty

The initial form of the data flow represents a true image

of the floating-point dot product expression. There is one

add operator with Nchild nodes a multiply operator foreach ai xi term, as in (1).

Node ai has a very long fractional part 24 bits if the

floating-point literals of the dot product expression are

parsed as single precision values. If node xi has, for

example, the run-time interval equal to [0; 1023], then the

multiply node overflows. Four data types are permitted for a

node: signed/unsigned 16-bit and 32-bit integers. To make

the multiply node not to overflow it is necessary to make its

integer interval smaller (to decrease the fractional part).

There are two possibilities: shift to right ai at design-time or

shift to right xi at run-time. Each solution has its own

impact on the data flow computational complexity and

accuracy. In this case it is simple to decide which one

solution to select discarding some least significant bits

from the very precise constant at design-time means no run-

time overhead and causes less accuracy loss than in

comparison with the right shift ofxi. But this is a rare case.

In general case, a solution is either low-complexity-low-

accuracy or high-complexity-high-accuracy (not low-

complexity-high-accuracy). This makes it difficult to

compare candidate solutions. It is necessary to determine

quantitatively the complexity and accuracy of a particular

solution. There is no other way, because the number of thealternative possibilities grows very quickly with the size of

the data flow area below the node whose integer interval

must be modified.

B. Computational ComplexityA data flow node has an associated computational

complexity. This is an estimator of the computational effort

required to obtain at run-time the node value. At design-

time the exact computational complexity is difficult to

evaluate. The Java application simply counts the operators

contained in the data flow area below the target node. This

is a sufficiently good approximation.

C. Node Error Interval (Drift)Every data flow node stands for a fixed-point value which

can vary within a specific interval. This interval refers to the

node value at run-time, which is in essence an integer very

close to the ideal infinite-precision value. Thus, every node

has a specific error. The interval of the infinite-precision

value that can pass through a node, and the corresponding

interval of the run-time integer, can be calculated at design-

time, which means that, for each node, the interval of the

error can be obtained without actually running the code.

The error interval of an operator node can be calculated

using the integer interval and the error interval of every

child node [6]. For example, the error interval of an add

node can be calculated by adding the error interval of every

child node.The low/high values of an error interval are considered to

be absolute values; not relative values or, in other words,

units in the last place (ulps) [7].

D. Multi-Objective SearchThe simplest way to decrease or increase the integer

interval of a node is to basically perform a shift operation.

This can be done at design-time if the node represents a

constant value or at run-time if the node represents an

operator and (very important) the integer interval is valid

(does not overflow). But these are not frequent cases. The

most usual situation is when an operator overflows and itsinteger interval must be decreased. The problem is that the

integer interval of an operator cannot be changed directly

the integer intervals of the child nodes must be altered. To

force the integer interval of an add node it is necessary to

force the integer interval ofevery child node (logical AND).

To force the integer interval of a multiply node it is

necessary to force one or more child nodes (logical OR).

In general case, there are multiple ways the increase or

decrease the integer interval of a node (logical OR). One

possible way is called a solution. A solution involves a

number of data flow changes (logical AND). A change can

be viewed in the simplest way as a node switchthe child

node of an operator is replaced with another child node. A

change is invertiblea change can be applied and can be

undone. This is a very important feature. Because a change

is always a part of a solution, it makes sense to say that a

solution is applied or undone (meaning that all the changes

included are applied or undone).

Multiple solutions can be viewed as being concurrentif

all of them are built with the same purposefor example, to

make the integer interval of a specific node smaller. But

7/27/2019 parthi12

3/6

each solution consists of a number of particular (different)

changes. Thus each solution has its own computational

complexity and influence on accuracy. The Java application

compares concurrent solutions by these two metrics. To

evaluate the complexity and error interval of a solution, it is

applied (some child nodes are disconnected and others areconnected).

This algorithm step (switching between different

solutions) is essentially a search. The method described in

this paper resembles other methods if regarding the way the

data flow is implemented (types of nodes) and the usage of

operator properties (value propagation). From this point of

view, the method described here can be considered

analytical. But it still does a search! It can be regarded as

being search-based, but, anyway, it is very different from

other search-based methods. The method described in this

paper connects/disconnects various data flow fragments,

while other methods scan large, large multi-dimensionalspaces that represent fractional word-lengths.

To compare several concurrent solutions it is necessary to

combine, for each solution, the complexity and error

interval into a single indicator. In order to do this, a linear

function is used:

errorkcomplexitykcost 21 (2)

Varying the cost function coefficients is like, for

example, favoring solutions which introduce considerable

computational overhead but give high accuracy results in

place of low-complexity-low-accuracy solutions.

Although the cost function has two parameters, the

variation space is one-dimensional. The cost function can be

represented geometrically as a line which passes through the

origin point in a two-dimensional space (Fig. 1).

Fig. 1: Solution space

In Fig. 1 the cost of one solution is directly proportional

to the shortest distance to the cost function line. The

complexity coefficient (k1) and the error coefficient (k2)

determine together the slope of cost function line. Two

coefficients are used, because, otherwise it would be

impossible to represent the vertical line (+INF slope). For

simplicity, the sum of the cost coefficients is kept constant:

121 kk (3)

The cost of one solution has no meaning if considered

separately. It does make sense only in comparison with the

costs of other solutions.

E. Transformation ExampleFig. 2 shows two extreme data flow structures obtained

for a dot product with 12 terms. (Such images can be

created with Graphviz software.)

Fig. 2: High level view of two extreme data flow structures obtained for a

dot product expression with 12 termslow-complexity-low-accuracy

(left) and high-complexity-high-accuracy (right).

In Fig. 2, the data flow on the left hand side has a specific

pattern. The fractional word length is the same for the most

part of the nodes. The number of operators is minimal

(complexity = 26). In contrast, the data flow on the right

hand side is very developed and doesnt have a specific

pattern (at global level). Some nodes have very long

fractional parts (which is not visible). The number of

operators is maximal (complexity = 53).

V. DESIGN-TIME TECHNIQUESA. Node Cache Information

The optimization process makes extensive use of somenode attributes like integer interval and drift. For operator

nodes this information depends on the child nodes

(operands) and must be computed. The time required for

this can become significant for large data flowsthe high

nodes generate a lot of subsequent calls to nodes located

below to get the necessary information. This traffic can be

somehow diminished. The data flow structure is not itself

very dynamica change that is applied in the optimization

process has a limited impact area. In many cases the integer

interval and drift information can be reused. For this

7/27/2019 parthi12

4/6

purpose, each operator node is designed with its own cache.

In this way, obtaining the integer interval or the drift

information can be very straightforward; unless the cache of

the target node is invalidated. The invalidation of the cache

is crucial. Doing this for fewer nodes than required may

lead to erroneous results, and, doing this for more nodesthan it is required can lower the cache hit rate.

The cache invalidation is triggered in the following

mannerwhenever a nodeN is connected with an operator

F, a message of change is propagated along the chain of

operators from F to the root operator to invalidate the

corresponding cache data.

The design time is reduced considerably with node cache

information. This is easy to observe as the filter length is

increased. Without caching, transforming the data flow of a

dot product with 16 terms can take more than 10 seconds.

Turning the cache mechanism on, the data flow is optimized

is tens of milliseconds.Fig. 3 shows the execution time of the optimization

procedure for dot products with different lengths (node

cache information is used).

Fig. 3: Average data flow transformation time

B. Automatic Search of Data FlowsVarying the coefficients of the cost function leads to

different data flow structures. The coefficients can be set by

hand; but this is not very practical. The reason is that the

coefficients themselves do not carry very much information

(except for the extreme cases). In a concrete situation it

might be more desirable to generate all the possible data

flows, create code for all of them, and later select the most

convenient function.

From a high level point of view, the search method,

whichever is, should traverse the one-dimensional search

space from 0 to 1, generate various data flows, and pick-up

the unique ones. The ideal search method should generate

as few as possible equivalentdata flows. Two data flows are

considered equivalent if, while traversing both structures in-

depth in parallel, every node that is encountered has the

same type (add, multiply), has the same integer interval

(low/high values, fractional length) and has the same

number of child nodes as its mirror node.

Performing a sequential search can be very time-

consuming. Varying a coefficient from 0 to 1 using a

constant step and generating all the corresponding data

flows can be very inefficient. A large number of data flow

structures are equivalent and the generation of a single one

requires an important amount of time for example, a dotproduct with 10 terms requires 10-15 milliseconds. On the

other hand, the increment step has to be small enough to

capture all the possible data flow structures.

Fortunately, it is possible to perform a more selective

search. It is just necessary to use the following context

information: if the end-points of a segment inside the search

space generate the same data flow structure then it does

not make sense to sweep this particular segment. No new

data flow structures can be discovered in this area. But if

the end-points of a segment generate different data flows

then the segment should be halved and the same procedure

should be applied further for the resulted segments. Thismethod is very efficient, because the number of the

discarded data flow structures is minimal.

Fig. 4: Complexity of data flows that are found for a dot product with 20

terms (partial view). The complexity coefficient is swept from 0 to 1, while

the drift coefficient is set to a complementary value (2). A horizontal

segment represents one or more data flows with the same complexity.

Given an arbitrary filter, the number of non-equivalent

data flows that can be found by varying the cost coefficients

is proportional to the filter length. As a rule, ifN is the

filter length, then the second search method yields

0.5N1.5Nnon-equivalent data flows.

VI. CODE GENERATIONGenerating fixed-point C code for a particular data flow

is, in essence, a straightforward process. However, there are

two important aspects: the declaration of the intermediary

variables and the explicit data type casts [8].

The C code can be generated in two very different forms:

as a long sequence of short assignments (one operator in

every right hand side) and a lot of intermediary variables or

as a single, very long, arithmetic expression and a lot of

7/27/2019 parthi12

5/6

parentheses. Although honestly both forms of code look

meaningless, the first variant can be used for debugging

purposes, because all the intermediary variables are declared

and can be watched step by step. The second variant is

preferable in case no compiler optimizations are applied.

Generating a very long line with arithmetic operatorsposes some problems. It is so because the compiler must

deduce the data type for some subexpressions. (In case the

intermediary variables are declared, their data type is clearly

stated.) Examples:

Short multiplication. The compiler might considerthat the result of a multiplication between two 16-bit

integers is a 16-bit integer. This is in general not desirable,

because most multiply nodes produce 32-bit values; so,

when the code is generated, short integers that must be

multiplied are explicitly cast to long integers.

Signed/Unsigned arithmetic. There are cases when asigned integer is added with an unsigned integer and theresult is known to be nonnegative, but the compiler assumes

that it is signed. If such an integer must be shifted to right,

the compiler might do an arithmetic (not logical) shift,

which is wrong, because the most significant bit would be

interpreted differently. To avoid this, additional casts are

inserted when generating the code.

VII. RESULTSA. Accuracy

When a fixed-point C function is generated, the error

interval of its result is already known. This is the worst-case

indicator computed at design-time the drift of the data

flow root node.

Note: The error is considered as the difference between

the floating-point value obtained with the original floating-

point expression (the reference value) and the integer value

obtained with the generated fixed-point code.

A more relevant accuracy indicator is the signal-to-

quantization-noise-ratio (SQNR) computed with the mean of

the absolute reference values (S) and the mean of the

absolute error values (N):

N

SSQNR 10log10 (3)

The SQNR values are computed on a high-speed

computer (not on microcontrollers). It is important to run

the fixed-point code with as many as possible different

input parameters.

Fig. 5 illustrates the accuracy and the complexity of the

solutions that are found (automatically) for a dot product

with 24 terms. The accuracy is represented as the difference

between the highest possible (ideal) SQNR and the SQNR

of the generated code. The highest possible SQNR is

defined as the SQNR of a function that would return the

integer nearest to the ideal floating-point value.

Fig. 5: Solutions found for a dot product with 24 terms, random coefficients

within the interval [-1, 1] and variables within the interval [0, 4095]

The SQNR degrades as the dot product number of terms

grows and, especially, as the complexity cost coefficient is

increased.

B. SpeedThe execution time of the generated fixed-point code

depends on many factors:

The filter. The number of data flow nodes is directlyproportional to the number of filter taps. This holds true

before and after the data flow is optimized.

The cost function. Varying the cost coefficients leadsto specific data flow transformation decisions (as discussed

in sectionMulti-Objective Search).

The code generation. If the fixed-point code isgenerated as one very long expression (everything inline),

then most of the intermediary values are allocated in

registers and, in effect, the number of load/store operations

is decreased. This is especially important when no compiler

optimizations are applied.

The compiler. Turning the compiler optimizations oncan greatly accelerate the fixed-point code. This is worth

considering especially when the intermediary variables are

declared.

The microprocessor. The microprocessor capabilitiesare not regarded in detail, because the main purpose is to

generate platform-independent code, not assembler. The

only thing assumed is that there are no floating-point units,

which is characteristic for embedded microprocessors. The

microprocessors used for testing are shown in Table I. Some

instruction sets include integer division (something that can

be used instead of bitwise shift), but this is not a general

feature and is not considered.

7/27/2019 parthi12

6/6

TABLEI

MICROPROCESSORS USED FORTESTING

Microprocessor Register width Compiler

ATmega16 8-bit IAR

MSP430F149 16-bit IAR

STM32F 32-bit gcc

LPC1768 32-bit

IAR

The speed factor between the fixed-point code and the

floating-point code can vary within a wide range. One very

important cause is the cost function used throughout the

data flow optimization. For low-complexity-low-accuracy

solutions the speed can be increased by 15 times or more.

For high-complexity-high-accuracy solutions the speed

can be increased by at least 3 times. (These results are

obtained with floating-point dot products with 4-32 terms

generated randomly.)

C. Memory Usage (Flash and SRAM)The fixed-point code takes in general slightly more Flash

space (code memory) than the floating-point code.

The SRAM (data memory) usage is determined mainly by

the stack requirements. The fixed-point code, if generated as

a single arithmetic expression (no intermediary variables),

occupies almost no stack space. The floating-point code

needs a specific amount of stack, because it is calling low-

level functions.

VIII. CONCLUSIONSA method for transforming floating-point expressions to

integer C code for embedded processors is described.

Direct-Form I non-adaptive filters with predefined input

bounds are targeted. The algorithm presented uses a

parametrizable cost function and is able to produce multiple

solutions for the same given floating-point expression.

The method can be applied for FIR filters, as well as for

IIR filters if the intervals of the output variables can be

specified. (There is work in progress for recursive filters.)

The generated code is tested on 8-bit, 16-bit, and 32-bit

microprocessors, using different compilers.

There can be two major realizations of the presented

algorithm: as a stand-alone application for code conversion(as it is currently implemented) or as a separate type of

compiler IR optimization (requires integration in a compiler

system).

ANNEX

A floating-point expression is converted to fixed-point

code, for illustrative purposes:

0.023159746f*x[0]+0.007362494f*x[1]+0.109808266f*x[2]-

0.8996903f*x[3]-0.52352905f*x[4]+0.34677517f*x[5]+

0.50765723f*x[6]+0.9989124f*x[7]+0.5545187f*x[8]-

0.73752284f*x[9]

This expression can be viewed as a FIR filter. The

interval of the input variables x[0-9] is set to [0; 4095]. The

coefficients are generated randomly in the interval [-1; 1].

The conversion to fixed-point takes 106 milliseconds and

yields 11 non-equivalent data flows. ANSI-C integer code is

generated. Here is the compact form of one solution (codewithout intermediary variables):

((unsigned long)(1159921664L + (((((unsigned

long)(((((unsigned long)(((((unsigned long)7720 * (unsigned

long)x[1]) + ((unsigned long)24284 * (unsigned long)x[0])) +

(115142L * (unsigned long)x[2])) + (532317L * (unsigned

long)x[6])) >> 2) + ((unsigned long)(1047435L * (unsigned

long)x[7]) >> 2)) + ((unsigned long)(581455L * (unsigned

long)x[8]) >> 2)) + (90905L * (unsigned long)x[5])) >> 1) -

((unsigned long)(193337L * (unsigned long)x[9]) >> 1)) + ((-

117924L) * (signed long)x[3])) + ((-68620L) * (signed

long)x[4]))) >> 17) - 8849

The accuracy of this fixed-point code is estimated by

running 1.9e+6 random test cases.

The SQNR of the fixed-point code is 38.694524dB. This

value is 0.000098dB less than the ideal SQNR. The errordistribution is as follows: in 99.80% of the cases the result

of the fixed-point code is the same as the integer nearest to

the floating-point expression, in 0.07% of the cases the

error is 1 and in 0.13% of the cases the error is -1.

IAR Embedded Workbench for ARM is used to measure

the performance of the integer code. LPC1768, which is an

ARM Cortex-M3 microprocessor, is selected as the target

architecture. It occurs that, without compiler optimizations,

in Simulator, the floating-point code takes 737-754 cycles

and the integer code takes 50 cycles. Thus, the execution

time is decreased by approximately 15 times.

REFERENCES

[1] K. I Kum, J. Kang, W. Sung, AUTOSCALER For C: An OptimizingFloating-Point to Integer C Program Converter For Fixed-Point Digital

Signal Processors,IEEE Trans. on Circuits and Systems II: Analog

and Digital Signal Processing, vol. 47, issue 9, pp. 840-848, Sep.

2000.

[2] D. Menard, D. Chillet, F. Charot, O. Sentieys, Automatic Floating-point to Fixed-point Conversion for DSP Code Generation, inProc.

of the 2002 International Conference on Compilers, Architecture,

and Synthesis for Embedded Systems , Oct. 2002.

[3] C. Shi, R. W. Brodersen. An Automated Floating-point to Fixed-point Conversion Methodology, inProc. of IEEE International Conf.

on Acoustics, Speech, and Signal Processing, vol. II, pp. 529-32,

2003

[4] K. Han, Automating transformations from floating-point to fixed-point, Ph.D. dissertation, Faculty of the Graduate School of the

University of Texas at Austin, 1996.

[5] A. Brleanu, V. Bitoiu, A. Stan, Digital filter optimization for Clanguage, in Advances in Electrical and Computer Engineering, to

be published.

[6] R. B. Kearfott, Interval Computations: Introduction, Uses, andResources,Euromath Bulletin, vol. 2, no. 1, pp. 95112, 1996.

[7] D. Goldberg, What Every Computer Scientist Should Know AboutFloating-Point Arithmetic, ACM Computing Surveys (CSUR), vol.

23, issue 1, 1991.

[8] Programming languages C, International Standard, ISO/IEC9899:TC2.

[9] R. J. Mitchell and P.R. Minchinton, A Note on Dividing Integers byTwo, The Computer Journal, 32, No. 4, Aug 1989, 380.

parthi12

Documents