Lets find out what a real compiler actually does with this code shall we? I like mingw gcc 4.3 (x86). I used "gcc test.c -O2 -S -c -Wall"
This function:
float calc_mean(unsigned char r, unsigned char g, unsigned char b)
{
return (r+b+g)/3/255.0f;
}
generates this object code (function entry and exit code removed for clarity. I hope the comments I added are roughly correct):
movzbl 12(%ebp), %edx ; edx = g
movzbl 8(%ebp), %eax ; eax = r
addl %eax, %edx ; edx = eax + edx
movzbl 16(%ebp), %eax ; eax = b
addl %eax, %edx ; edx = eax + edx
movl $1431655766, %eax ;
imull %edx ; edx *= a const
flds LC0 ; put a const in the floating point reg
pushl %edx ; put edx on the stack
fidivrl (%esp) ; float reg /= top of stack
Whereas this function:
float calc_mean2(unsigned char r, unsigned char g, unsigned char b)
{
const float AVERAGE_SCALE_FACTOR = 1.f / (3.f * 255.f);
return (r+b+g) * AVERAGE_SCALE_FACTOR;
}
generates this:
movzbl 12(%ebp), %eax
movzbl 8(%ebp), %edx
addl %edx, %eax
movzbl 16(%ebp), %edx
addl %edx, %eax
flds LC2
pushl %eax
fimull (%esp)
As you can see, the second function is better. Compiling with -freciprocal-math converts the fidivrl from the first function into an fimull, which ought to be an improvement. But the second function is still better.
However, if you consider that a modern desktop CPU has something like an 18 stage pipeline and that it is capable of executing several of these instructions per cycle, you can see that the performance of these functions will be dominated by stalls due to data dependencies. Hopefully your program has this code snippet inlined and with some loop unrolling.
Considering such a small code fragment in isolation isn't ideal. It's a bit like driving a car with binoculars glued to your eye sockets. Zoom out man!