tags:

views:

444

answers:

4

Updated: The actual resolution that the compile box which served my compile request was different. In the slower instance I was running code compiled on a SuSE 9 but running on a SuSE 10 box. That was sufficient difference for me to drop it and compare apples to apples. When using the same compile box the results were as follows:

g++ was about two percent slower

delta real 4 minutes delta user 4 mintues delta system 5 seconds

Thanks!

gcc v4.3 vs g++ v4.3 reduced to simplest case used nothing but simple flags

#include <stdio.h>
#include <stdlib.h>
int main (int argc, char **argv)
{
    int i=0;
    int j=0;
    int k=0;
    int m=0;
    int n=0;
    for (i=0;i<1000;i++)
        for (j=0;j<6000;j++)
            for (k=0;k<12000;k++)
            {
                 m = i+j+k;
                 n=(m+1+1);
            }
    return 0;
}

Is this a known issue? The 15% is very repro. and is across the board for real, system, and user time. I have to wait to post the assembly until tomorrow.

Update: I have only tried on one of my compile boxes. I am using SuSE 10.

+2  A: 

In order to figure out why its slower you'll probably need to take a look at the assemblies that are produced by the compiler. The g++ compiler must be doing something different from the gcc compiler.

mezoid
Same compiler - different flags. In particular, g++ sets the "compile as C++" flag to GCC.
MSalters
A: 

Oh, that is a fun one. But the code you gave us doesn't compile. You need

(int argc, char** argv)
Charlie Martin
see comments section on question... thanks...
ojblass
+7  A: 

When compiled with gcc and g++ the only difference I see is within the first 4 lines.

gcc:

    .file "loops.c"
    .def ___main; .scl 2; .type 32; .endef
    .text
.globl _main

g++:

    .file "loops.c"
    .def ___main; .scl 2; .type 32; .endef
    .text
    .align 2
.globl _main

as you can see the only difference is that with g++, the alignment (2) occurs on a word boundary. This tiny difference seems to be making the significant performance difference.

Here is a page explaining structure alignment, although it is for ARM/NetWinder it is still applicable as it discusses how alignment works on modern CPUs. You will want to read section 7 specifically "What are the disadvantages of word alignment?" :

http://netwinder.osuosl.org/users/b/brianbr/public_html/alignment.html

and here is a reference on the .align operation:

http://www.nersc.gov/vendor_docs/ibm/asm/align.htm

Benchmarks as requested:

gcc:

john@awesome:~$ time ./loopsC

real    0m21.212s
user    0m20.957s
sys 0m0.004s

g++:

john@awesome:~$ time ./loopsGPP

real    0m22.111s
user    0m21.817s
sys 0m0.000s

I reduced the inner-most iteration to 1200. Results aren't as widespread as I had hoped, but then again the assembly output was generated on windows, and the timings done in Linux. Maybe something different is done behind the scenes in MinGW than it is with gcc for Linux alignment-wise.

John T
What version of gcc are you using?
ojblass
Could the align negatively impact performance?
ojblass
4.4.0 (latest as of this post)
John T
Can you run time on both versions of the exe? Each and every time I got i really significant difference.
ojblass
@John, g++ is doing proper alignment. So, shouldn't that be faster?
chappar
Maybe its tear up and tear down code instead?
ojblass
hmm.. 72 billion iterations might take a little while
John T
I think my box is a bit poluted with libraries newer than others for reasons I have not gotten to the bottom of... eager to see the numbers.
ojblass
I tried to eliminate startup time to noise... you can reduce them.
ojblass
Alright I am going to try it on a cleaner box tomorrow. Thank you so much.
ojblass
+1  A: 

One of the reason would be that gcc might have optimized the assignment of m and n, so that they can run in parallel.

That can done like this

m = i+j+k;
n = i+j+k+2;

I am not sure this than improve the performance by 15%. This might give bit of performance boost in multicore CPU. The best way is to compare the assembly code of 2 compilers.

chappar
Maybe an optimized alignment?
ojblass