I don't know. But I do know how to write a quick benchmark with no guarantees of scientific validity (actually, one with rather strict guarantees of invalidity). It has interesting results:
#include <time.h>
#include <stdio.h>
int main(void)
{
int i;
int s;
clock_t start_time, end_time;
int centiseconds;
start_time = clock();
s = 1;
for (i = 0; i < 1000000000; i++)
{
s = s + i;
}
end_time = clock();
centiseconds = (end_time - start_time)*100 / CLOCKS_PER_SEC;
printf("Answer is %d; Forward took %ld centiseconds\n", s, centiseconds);
start_time = clock();
s = 1;
for (i = 999999999; i >= 0; i--)
{
s = s + i;
}
end_time = clock();
centiseconds = (end_time - start_time)*100 / CLOCKS_PER_SEC;
printf("Answer is %d; Backward took %ld centiseconds\n", s, centiseconds);
return 0;
}
Compiled with -O9 using gcc 3.4.4 on Cygwin, running on an "AMD Athlon(tm) 64 Processor 3500+" (2211 MHz) in 32 bit Windows XP:
Answer is -1243309311; Forward took 93 centiseconds
Answer is -1243309311; Backward took 92 centiseconds
(Answers varied by 1 either way in several repetitions.)
Compiled with -I9 using gcc 4.4.1 running on an "Intel(R) Atom(TM) CPU N270 @ 1.60GHz" (800 MHz and presumably only one core, given the program) in 32 bit Ubuntu Linux.
Answer is -1243309311; Forward took 196 centiseconds
Answer is -1243309311; Backward took 228 centiseconds
(Answers varied by 1 either way in several repetitions.)
Looking at the code, the forward loop is translated to:
; Gcc 3.4.4 on Cygwin for Athlon ; Gcc 4.4.1 on Ubuntu for Atom
L5: .L2:
addl %eax, %ebx addl %eax, %ebx
incl %eax addl $1, %eax
cmpl $999999999, %eax cmpl $1000000000, %eax
jle L5 jne .L2
The backward to:
L9: .L3:
addl %eax, %ebx addl %eax, %ebx
decl %eax subl $1, $eax
jns L9 cmpl $-1, %eax
jne .L3
Which shows, if not much else, that GCC's behaviour has changed between those two versions!
Pasting the older GCC's loops into the newer GCC's asm file gives results of:
Answer is -1243309311; Forward took 194 centiseconds
Answer is -1243309311; Backward took 133 centiseconds
Summary: on the >5 year old Athlon, the loops generated by GCC 3.4.4 are the same speed. On the newish (<1 year?) Atom, the backward loop is significantly faster. GCC 4.4.1 has a slight regression for this particular case which I personally am not bothered about in the least, given the point of it. (I had to make sure that s
is used after the loop, because otherwise the compiler would elide the computation altogether.)
[1] I can never remember the command for system info...