For the record, gcc, when compiling with optimization specifically disabled (-O0
), produces different code for the two inputs (in my case, the body of foo
was return rand();
so that the result would not be determined at compile time).
Without temporary variable t
:
movl $0, %eax
call foo
testl %eax, %eax
je .L4
/* inside of if block */
.L4:
/* rest of main() */
Here, the return value of foo
is stored in the EAX register, and the register is tested against itself to see if it is 0, and if so, it jumps over the body of the if block.
With temporary variable t
:
movl $0, %eax
call foo
movl %eax, -4(%rbp)
cmpl $0, -4(%rbp)
je .L4
/* inside of if block */
.L4:
/* rest of main() */
Here, the return value of foo
is stored in the EAX register, then pushed onto the stack. Then, the contents of the location on the stack are compared to literal 0, and if they are equal, it jumps over the body of the if block.
And so if we assume further that the processor is not doing any "optimizations" when it generates the microcode for this, then the version without the temporary should be a few clock cycles faster. It's not going to be substantially faster because even though the version with a temporary involves a stack push, the stack value is almost certainly still going to be in the processor's L1 cache when the comparison instruction is executed immediately afterwords, and so there's not going to be a round trip to RAM.
Of course the code becomes identical as soon as you turn on any optimization level, even -O1
, and who compiles anything that is so critical that they care about a handful of clock cycles with all optimizations off?
Edit: With regard to your further information about your hardware engineer friend, I can't see how accessing a value in the L1 cache would ever be faster than accessing a register directly. I could see it being just about as fast if the value never even leaves the pipeline, but I can't see it being faster, especially since it still has to execute the movl
instruction in addition to the comparison. But show him the assembly code above and ask what he thinks; it will be more productive than trying to discuss the problem in terms of C.