ansaurus

Question

Optimizing a floating point division and conversion operation

Answer 1

+7 A:

you obviously compare the mean to something else, which is also between 0 and 1. how about you just multiply that something by 255 instead?

Omry 2009-09-02 08:29:13

unfortunately I can't skip the division. The normalization is mandatory because it has to eventually occur anyways for normalized cross correlation

ldog 2009-09-02 08:32:08

+1 for the pragmatic solution.

Chris Lutz 2009-09-02 08:32:37

Answer 2

+13 A:

Pre-convert your divisions into a multiplicable constant:

a / 3 / 255

is the same as

a * (1 / (3 * 255))

so pre-compute:

const float AVERAGE_SCALE_FACTOR = 1.f / (3.f * 255.f)

then just do

float mean = (r + g + b) * AVERAGE_SCALE_FACTOR;

since multiplying is generally a lot faster than dividing.

unwind 2009-09-02 08:32:56

+1 for a solution that will most likely lead to a speed up and can be elegantly implemented. However it's worth measuring to ensure that it is really that faster.

sharptooth 2009-09-02 08:40:48

good speed up thanks, too bad I can't use the shift trick would be even faster I suspect :(

ldog 2009-09-02 08:41:16

If your compiler is optimizing at all, the multiplication should be fine. The shift trick probably won't work for floats. It operates on bits, not numbers.

Chris Lutz 2009-09-02 08:48:34

gcc has **-freciprocal-math** option that does this optimization trick for you.

Pavel Shved 2009-09-02 09:03:03

A expression like "3/255.0f" will always be calculated and replaced by the resulting constant by the compiler. Its of course a good idea to create such a const for code clearness, but i strongly doubt that it will bring any performance increase.

RED SOFT ADAIR 2009-09-02 11:21:06

@StefanWoe: sure, but collapsing the constant is not the same as (possibly) strength-reducing the operation itself from division to multiplication.

unwind 2009-09-02 11:59:40

No, in this case it won't. The division operator in C associates left-to-right, so the first division is an *integer* division by 3, and the second is a *floating point division* by 255.0 - this sequence does not give the same result as a single division by (3 * 255.0), so the compiler is *not* free to do that.

caf 2009-09-02 23:54:00

Answer 3

+2 A:

Lets find out what a real compiler actually does with this code shall we? I like mingw gcc 4.3 (x86). I used "gcc test.c -O2 -S -c -Wall"

This function:

float calc_mean(unsigned char r, unsigned char g, unsigned char b)
{
    return (r+b+g)/3/255.0f;
}

generates this object code (function entry and exit code removed for clarity. I hope the comments I added are roughly correct):

 movzbl 12(%ebp), %edx    ; edx = g
 movzbl 8(%ebp), %eax     ; eax = r
 addl %eax, %edx        ; edx = eax + edx
 movzbl 16(%ebp), %eax    ; eax = b
 addl %eax, %edx        ; edx = eax + edx
 movl $1431655766, %eax ; 
 imull %edx              ; edx *= a const
 flds LC0               ; put a const in the floating point reg
 pushl %edx              ; put edx on the stack
 fidivrl (%esp)            ; float reg /= top of stack

Whereas this function:

float calc_mean2(unsigned char r, unsigned char g, unsigned char b)
{
    const float AVERAGE_SCALE_FACTOR = 1.f / (3.f * 255.f);
    return (r+b+g) * AVERAGE_SCALE_FACTOR;
}

generates this:

 movzbl 12(%ebp), %eax    
 movzbl 8(%ebp), %edx
 addl %edx, %eax
 movzbl 16(%ebp), %edx
 addl %edx, %eax
 flds LC2
 pushl %eax
 fimull (%esp)

As you can see, the second function is better. Compiling with -freciprocal-math converts the fidivrl from the first function into an fimull, which ought to be an improvement. But the second function is still better.

However, if you consider that a modern desktop CPU has something like an 18 stage pipeline and that it is capable of executing several of these instructions per cycle, you can see that the performance of these functions will be dominated by stalls due to data dependencies. Hopefully your program has this code snippet inlined and with some loop unrolling.

Considering such a small code fragment in isolation isn't ideal. It's a bit like driving a car with binoculars glued to your eye sockets. Zoom out man!

Andrew Bainbridge 2009-09-02 17:39:59

Answer 4

+1 A:

As shown by Andrew, the original function is not optimized at all. The compiler couldn't because you were dividing the sum first by an integer and then by a float. That's not the same as multiplying by the aforementioned average scale factor. If you would change (r+g+b)/3/255.0f into (r+g+b)/3.0f/255.0f, the compiler might optimize it to use fimull automatically.

2009-09-02 19:11:08

+1 Yes, they are not the same: http://codepad.org/0efh0w19

Liran Orevi 2009-09-02 20:03:36

Answer 5

+1 A:

It's very common to optimise such operations for a platform, rather than as an algorithm or as portable C. The Virtual Dub blog is well worth reading for tips as to how it's done in software targeted at x86 and x64 architectures, and has a couple of entries on optimising pixel averages 1 2.

Pete Kirkham 2009-09-03 08:52:02

ansaurus

tags:

views:

answers:

Optimizing a floating point division and conversion operation

related questions