tags:

views:

213

answers:

4

I'm distributing a C++ program with a makefile for the Unix version, and I'm wondering what compiler options I should use to get the fastest possible code (it falls into the category of programs that can use all the computing power they can get and still come back for more), given that I don't know in advance what hardware, operating system or gcc version the user will have, and I want above all else to make sure it at least works correctly on every major Unix-like operating system.

Thus far, I have g++ -O3 -Wno-write-strings, are there any other options I should add? On Windows, the Microsoft compiler has options for things like fast calling convention and link time code generation that are worth using, are there any equivalents on gcc?

(I'm assuming it will default to 64-bit on a 64-bit platform, please correct me if that's not the case.)

+1  A: 

-oFast


Please try -oFast instead of -o3

Also here is a list of flags you might want to selectively enable.

-ffloat-store

-fexcess-precision=style

-ffast-math

-fno-rounding-math

-fno-signaling-nans

-fcx-limited-range

-fno-math-errno

-funsafe-math-optimizations

-fassociative-math

-freciprocal-math

-fassociative-math

-freciprocal-math

-ffinite-math-only

-fno-signed-zeros

-fno-trapping-math

-frounding-math

-fsingle-precision-constant

-fcx-limited-range

-fcx-fortran-rules

A complete list of the flags and their detailed description is available here

GoodLUCK!!
- CVS

CVS-2600Hertz
Also checkout Il-Bhima's neat explanation above...
CVS-2600Hertz
+5  A: 

Without knowing any specifics on your program it's hard to say. O3 covers most of the optimisations. The remaining options come "at a cost". If you can tolerate some random rounding and your code isn't dependent on IEEE floating point standards then you can try -Ofast. This disregards standards compliance and can give you faster code.

The remaining optimisations flags can only improve performance of certain programs, but can even be detrimental to others. Look at the available flags in the gcc documentation on optimisation flags and benchmark them.

Another option is to enable C99 (-std=c99) and inline appropriate functions. This is a bit of an art, you shouldn't inline everything, but with a little work you can get your code to be faster (albeit at the cost of having a larger executable).

If speed is really an issue I would suggest either going back to Microsoft's compiler, or to try Intel's. I've come to appreciate how slow some gcc compiled code can be, especially when it involves math.h.

EDIT: Oh wait, you said C++? Then disregard my C99 paragraph, you can inline already :)

Il-Bhima
Funny, I've come to appreciate how slow MSVC compiled code can be :-) I also don't think that applies, as the poster seems to want GCC so it can target Spark, PPC, "every major Unix-like operating system".
phkahler
Actually I've never used the MSVC compiler. I'm surprised it's not fast, you would think MS would have optimised the crap out of it seeing that they probably have 90% of their software compiled on it. I am comparing to Intel's which I have used extensively. Yeah, I just realised that the OP wants it to target most unixes making it even harder to list a fixed set of opt flags.
Il-Bhima
I'm compiling the Windows binary with the Microsoft compiler (about 5 to 10% faster than GCC by my tests), this is for the Unix distribution. As far as I now understand it, -Ofast etc may (or may not) help floating-point code, but for integer code -O3 already gives you the full Monty?
rwallace
Ok as far as I know (and the doc seems to agree) Ofast is funsafe-math, which applies only to floating point math, so if you've got only integer math it's probably not going to help. However, I wouldn't say O3 gives you the full monty, there are other options which O3 doesn't use since they don't guarantee faster code. Optimisation is highly program and architecture dependent. Its possible that disabling one of the O3 opts could improve your performance. If performance is that vital benchmark your program on a set of machines and have a set of flags for each architecture in your makefile.
Il-Bhima
+4  A: 

I would try profile guided optimization:

-fprofile-generate Enable options usually used for instrumenting application to produce profile useful for later recompilation with profile feedback based optimization. You must use -fprofile-generate both when compiling and when linking your program. The following options are enabled: -fprofile-arcs, -fprofile-values, -fvpt.

You should also give the compiler hints about the architecture on which the program will run. For example if it will only run on a server and you can compile it on the same machine as the server, you can just use -march-native. Otherwise you need to determine which features your users will all have and pass the corresponding parameter to GCC.

(Apparently you're targeting 64-bit, so GCC will probably already include more optimizations than for generic x86.)

Bastien Léonard
+1  A: 

gcc -O3 is not guaranteed to be the fastest. -O2 is often a better starting point. After that, profile guided optimization and trying out specific options: http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html

It's a long read, but probably worth it.

Note that a "Link Time Code Generation" (MSVC) aka "Link Time Optimization" is available in gcc 4.5+

By the way, there is no specific "fastcall" calling convention for Win64. There is only "the" calling convention: http://msdn.microsoft.com/en-us/magazine/cc300794.aspx

rubenvb