Hey,
I'm running the following benchmark:
int main(int argc, char **argv)
{
char *d = malloc(sizeof(char) * 13);
TIME_THIS(func_a(999, d), 99999999);
TIME_THIS(func_b(999, d), 99999999);
return 0;
}
with normal compilation, the results are the same for both functions
% gcc func_overhead.c func_overhead_plus.c -o func_overhead && ./func_overhead
[func_a(999, d) ] 9276227.73
[func_b(999, d) ] 9265085.90
but with -O3 they are very different
% gcc -O3 func_overhead.c func_overhead_plus.c -o func_overhead && ./func_overhead
[func_a(999, d) ] 178580674.69
[func_b(999, d) ] 48450175.29
func_a and func_b are defined like this:
char *func_a(uint64_t id, char *d)
{
register size_t i, j;
register char c;
for (i = 0, j = 36; i <= 11; i++)
if (i == 4 || i == 8)
d[i] = '/';
else {
c = ((id >> j) & 0xf) + '0';
if (c > '9')
c = c - '9' - 1 + 'A';
d[i] = c;
j -= 4;
}
d[12] = '\0';
return d;
}
the only difference is that func_a in the same file as main() and func_b is in the func_overhead_plus.c file
I'm wondering if anyone could elaborate on what's going on
Thanks
Edit:
Sorry about all the confusion regarding the results. they are actually calls per second, so func_a() is faster than func_b() with -O3
TIME_THIS is defined like so:
double get_time(void)
{
struct timeval t;
gettimeofday(&t, NULL);
return t.tv_sec + t.tv_usec*1e-6;
}
#define TIME_THIS(func, runs) do { \
double t0, td; \
int i; \
t0 = get_time(); \
for (i = 0; i < runs; i++) \
func; \
td = get_time() - t0; \
printf("[%-35s] %15.2f\n", #func, runs / td); \
} while(0)
The architecture is Linux
Linux komiko 2.6.30-gentoo-r2 #1 SMP PREEMPT Wed Jul 15 17:27:51 IDT 2009 i686 Intel(R) Core(TM)2 Quad CPU Q8200 @ 2.33GHz GenuineIntel GNU/Linux
gcc is 4.3.3
as suggested, here are the results of mixing the calls a little
-O3
[func_b(999, d) ] 48926120.09
[func_a(999, d) ] 135299870.52
[func_b(999, d) ] 49075900.30
[func_a(999, d) ] 135748939.12
[func_b(999, d) ] 49039535.67
[func_a(999, d) ] 134055084.58
-O2
[func_b(999, d) ] 27243732.97
[func_a(999, d) ] 27341371.38
[func_b(999, d) ] 27303284.93
[func_a(999, d) ] 27349177.65
[func_b(999, d) ] 27325398.25
[func_a(999, d) ] 27343935.88
(-O1 and -Os were same as -O2 in this test)
no optimizations
[func_b(999, d) ] 8852314.88
[func_a(999, d) ] 9646166.81
[func_b(999, d) ] 8909973.33
[func_a(999, d) ] 9734883.99
[func_b(999, d) ] 8726127.49
[func_a(999, d) ] 9566052.21
looks like no optimizations behaves like -O3 in the way that func_a seems to be faster than func_b
just for fun, compiling with gcc 4.4.4 seems to be interesting
no optimizations
[func_b(999, d) ] 16982343.03
[func_a(999, d) ] 19693688.36
[func_b(999, d) ] 17260359.40
[func_a(999, d) ] 18137352.08
[func_b(999, d) ] 16790465.45
[func_a(999, d) ] 19828836.94
-O3
[func_b(999, d) ] 52184739.72
[func_a(999, d) ] 99999237556468.61
[func_b(999, d) ] 52430823.56
[func_a(999, d) ] 101030101.92
[func_b(999, d) ] 52404446.52
[func_a(999, d) ] 100842538.40
this is pretty weird, isn't it?
Edit:
If the performance difference is indeed an inability of gcc4.3/4.4 to inline across objects, should it be considered a good practice to include performance critical function in the same file?
e.g
#include "performance_critical.c"
or is it just messy and most likely not really significant?
Thanks