ansaurus

Question

Answer 1

A:

A will likely be just a tiny bit faster because it does not do a variable assignment. The difference we're talking about is way too small to measure .

Tom Cabanski 2010-04-25 01:21:41

Err.. yes it does. The variable assignment is a non named temporary variable, but the assignment still occurs.

Billy ONeal 2010-04-25 01:32:44

Yes, [B] really says 'store foo()'s result to a temporary unnamed int and compare if it is nonzeo'.

AshleysBrain 2010-04-25 01:36:24

A value doesn't exist unless it's stored somewhere.

WhirlWind 2010-04-25 01:48:15

@WhirlWind: And that storage location is (most likely) some location on the stack or in a CPU register. Just because a variable is not named does not mean the compiler doesn't have to put it somewhere.

Billy ONeal 2010-04-25 02:54:49

I wonder if the author thinks he's coding in a interpreter where names must be looked up when accessed?

Wallacoloo 2010-04-25 03:35:49

Answer 2

+13 A:

The "optimisation" required to convert [B] into [A] is so trivial (especially if t is not used anywhere else) that the compiler probably won't even call it an optimisation. It might be something that it just does as a matter of course, whether or not optimisations are explicitly enabled.

The only way to tell is to ask your compiler to generate an assembly source listing for both bits of code, then compare them.

Greg Hewgill 2010-04-25 01:29:44

Hi Greg, totally agree with what you said. But we were discussing theoretical , Cycle by Cycle difference between two code snippets.+1 for the answer but I was exactly looking for hardware level explanation.

Gollum 2010-04-25 03:42:29

Answer 3

+2 A:

They are likely both going to be the same. That int will be stored into a register in either case.

Michael 2010-04-25 01:33:52

Answer 4

+3 A:

Executive Summary
1. We are talking about nanoseconds. Light moves a whopping 30cm in that time. 2. Sometimes, if you are really lucky, [A] is faster

Side note: [B] may have a different meaning
if the return type of foo is not int but an object that has implicit conversions to both int and bool, different code paths are executed. One might contain a Sleep.

Assuming a function returning int:

Depends on the compiler
Even with the restriction of "no optimization", there is no guarantee how the generated code will look like. B could be 10 times faster and the compiler would still be compliant (and you most likely wouldn't notice).

Depends on the hardware
Depending on your architecture, there might not even be a difference for the generated code, no matter how much your compiler tries.

Assuming a modern compiler on a modern x86 / x64 architecture:

On typical compilers, the difference is at most miniscule
that stores t in a stack variable, the two extra stack loads typically cost 2 clock cycles (less than a nanosecond on my CPU). That is negligible compared to the "surrounding cost" - a call to foo, the cost of foo itself, and a branch. An unoptimized call with a full stack frame can easily cost you 20.200 cycles depending on patform.

For comparison: cycle cost of a single memory access that is not in 1st level cache (roughly: 100 cycles from 2nd level, 1000 from main, hundreds of thousands from disk)

...or even nonexistent
Even if your compiler isn't optimizing, your CPU might. Due to pairing / microcode generation, the cycle cost may actually be identical.

peterchen 2010-04-25 02:04:30

Answer 5

+3 A:

For the record, gcc, when compiling with optimization specifically disabled (-O0), produces different code for the two inputs (in my case, the body of foo was return rand(); so that the result would not be determined at compile time).

Without temporary variable t:

        movl    $0, %eax
        call    foo
        testl   %eax, %eax
        je      .L4
        /* inside of if block */
.L4:
        /* rest of main() */

Here, the return value of foo is stored in the EAX register, and the register is tested against itself to see if it is 0, and if so, it jumps over the body of the if block.

With temporary variable t:

        movl    $0, %eax
        call    foo
        movl    %eax, -4(%rbp)
        cmpl    $0, -4(%rbp)
        je      .L4
        /* inside of if block */
.L4:
        /* rest of main() */

Here, the return value of foo is stored in the EAX register, then pushed onto the stack. Then, the contents of the location on the stack are compared to literal 0, and if they are equal, it jumps over the body of the if block.

And so if we assume further that the processor is not doing any "optimizations" when it generates the microcode for this, then the version without the temporary should be a few clock cycles faster. It's not going to be substantially faster because even though the version with a temporary involves a stack push, the stack value is almost certainly still going to be in the processor's L1 cache when the comparison instruction is executed immediately afterwords, and so there's not going to be a round trip to RAM.

Of course the code becomes identical as soon as you turn on any optimization level, even -O1, and who compiles anything that is so critical that they care about a handful of clock cycles with all optimizations off?

Edit: With regard to your further information about your hardware engineer friend, I can't see how accessing a value in the L1 cache would ever be faster than accessing a register directly. I could see it being just about as fast if the value never even leaves the pipeline, but I can't see it being faster, especially since it still has to execute the movl instruction in addition to the comparison. But show him the assembly code above and ask what he thinks; it will be more productive than trying to discuss the problem in terms of C.

Tyler McHenry 2010-04-25 02:41:30

Exactly that was my point.And +1 for posting the Machine Code :-) ! that proves the point to any sane human being :) ! thanks a lot.

Gollum 2010-04-25 03:39:31

Answer 6

A:

It really depends on how the compiler is built. But I think in most cases, A will be faster. Here's why:

In B, the compiler might not bother finding out whether t is ever used again, so it will be forced to preserve the value after the if statement. And that could mean pushing it onto the stack.

Wallacoloo 2010-04-25 03:34:31

ansaurus

tags:

views:

answers:

nested function call faster or not ?

related questions