views:

472

answers:

5

I have two algorithms written in C++. As far as I know, it is conventional to compile with
-O0 -NDEBUG (g++) while comparing the performance of two algorithms(asymptomatically they are same). But I think the optimization level is unfair to one of them, because it uses STL in every case. The program which uses plain array outperforms the STL-heavy algorithm 5 times faster while compiled with -O0 options. But the performance difference is not much different when I compile them with -O2 -NDEBUG.

Is there any way to get the best out of STL ( I am getting heavy performance hit in vector [] operator) in optimization level -O0 ?

What optimization level (and possibly variables like -NDEBUG) you use while comparing two algorithms ?

It will be also great help if someone can give some idea about the trend in academic research about comparing the performance of algorithms written in C++ ?

EDIT::

Ok, To isolate the problem of optimization level, I am using one algorithm but two different implementation now.

I have changed one of the functions with raw pointers(int and boolean) to std::vector and std::vector... With -O0 -NDEBUG the performances are 5.46s(raw pointer) and 11.1s(std::vector). And with -O2 -NDEBUG , the performances are 2.02s(raw pointer) and 2.21s(std::vector). Same algorithm, one implementation is using 4/5 dynamic arrays of int and boolean. And the other one is using using std::vector and std::vector instead. They are same in every other case

You can see that in -O0 std::vector is outperformed with twice faster pointers. While in -O2 they are almost the same.

But I am really confused, because in academic fields, when they publish the results of algorithms in running time, they compile the programs with -O0.

Is there some compiler options I am missing ?

+5  A: 

The compiler optimizations usually won't change the complexity order of an algorithm, just the constant and the linear scale factor. Compilers are fairly smart, but they're not that smart.

Are you going to be compiling your code for release with just -O0? Probably not. You might as well compare the performance of the algorithms when compiled with whatever compilation flags you actually intend to use.

Boojum
+1  A: 

I see no reason not to compile and run them both at O2. Unless you're doing it as a purely academic exercise (and even if you were it's very unlikely the optimizations would produce fundamental changes in the properties of the algorithm - Though, I think I'd be happy if GCC started turnning O(N) source into O(lgN) assembly) , you'll want information that's consistant what you would get when actually running the final program. You most likely won't be releasing the program with O0 optimizations, so you don't want to compare the algorithms under O0 optimizations.

Falaina
+5  A: 

It depends on what you want to optimize for.

Speed

I suggest using -O2 -NDEBUG -ftree-vectorize, and if your code is designed to specifically run on x86 or x86_64, add -msse2. This will give you a broad idea on how it will perform with GIMPLE.

Size

I believe you should use -Os -fno-rtti -fno-exceptions -fomit-frame-pointer. This will minimize the size of the executable to a degree (assuming C++).


In both cases, algorithm's speed is not compiler dependent, but a compiler can drastically change the way the code behaves if it can "prove" it can.

GCC detects 'common' code such as hand-coded min() and max() and turns them into one SSE instruction (on x86/x86_64 and when -msse is set) or using cmov when i686 is available (SSE has higher priority). GCC will also take liberty in reordering loops, unrolling and inlining functions if it wants to, and even remove useless code.

As for your latest edit:

You can see that in -O0 std::vector is outperformed with twice faster pointers. While in -O2 they are almost the same.

That's because std::vector still has code that throws exceptions and may use rtti. Try comparing with -O2 -NDEBUG -ftree-vectorize -fno-rtti -fno-exceptions -fomit-frame-pointer, and you'll see that std::vector will be slightly better than your code. GCC knows what 'built-in' types are and how to exploit them in real world use and will gladly do so - just like it knows what memset() and memcpy() does and how to optimize accordingly when copy size is known.

LiraNuna
The main focus is to compare, not optimize. But I guess the modern libraries are written in a way that without optimization they seem slower to hand written codes.
hasan
STL is very heavy when not optimized. GCC will generate code *as it sees it* and will not spend a second on optimizing. STL is designed to be used in a lot of cases, so specialized code will likely to outperform it - but if you write the *exact same* idea behind it, STL will win because of what stated in the answer.
LiraNuna
std::vector::at may legitimately throw an exception, also any functions that cause reallocation may also throw exceptions. `-fno-rtti` and `-fno-exceptions` cause gcc to depart from conforming C++ support and will break code.
Charles Bailey
It will not break code. If GCC sees the code relies on exceptions it will not compile. Same for -fno-rtti. If your code doesn't have `try {} catch {}` anywhere, it's "safe" for the compiler to not throw a physical exception - but to do the equivalent of `puts` and `exit(1)`.
LiraNuna
For me, not compiling counts as breaking code!
Charles Bailey
How can you break code if it cannot be executed? GCC is very smart, don't underestimate it because of what you've read about it 7 years ago.
LiraNuna
When I'm writing code, if it doesn't compile, then it's broken. I don't recall exactly what I might have read about gcc 7 years ago, but I'm not claiming that it isn't smart. What is true is that if you turn off RTTI and exceptions then gcc no longer acts as a C++ compiler. In particular the behaviour of `std::vector` will change. What effect this has depends on the exact nature of the client code.
Charles Bailey
OP wanted speed. Do you think every code is clean and conform to the spec? What about IEEE floats exploits? Sometimes hacks are good - as long as you can prove they are. I trust GCC to compile the best possible. I worked with GCC for over 8 years now, and while I can still write better assembly than it is, I know how code should be structured to be best optimized and how most flags behave.
LiraNuna
I have both exceptions and dynamic cast. But I will remember this, when I get some free time, I will try to make my code free of that.( I already did, but it seems the catch(std::bad_alloc ) has been used almost every piece of block. ) But is this what make std::vector to act slower ? I thought it was the indirect way of accessing informations ( most probably const functions, which are usually optimized with simple value read).
hasan
+1  A: 

You have two algorithms implemented in C++. If you want to compare the relative performance of the two implementations then you should use the optimization level that you are going to use in your final product. For me, that's -O3.

If you want to analyse the complexity of an algorithm, then that's more of an analysis problem where you look at the overall count of operations that must be performed for different sizes and characteristics of inputs.

As a developer writing code where performance is an issue, it is a good idea to be aware of the range of optimizations that a compiler can, and is likely to, apply to your code. Not optimizing unfairly penalises code that is written clearly, but designed to be easily optimized against code that is already 'micro-optimized'.

Charles Bailey
`-O3` is **very** agressive. It can break legal code.
LiraNuna
`-O3` is not allowed to break legal code, that would constitute a compiler bug. Do you have a reference?
Charles Bailey
Again, the main focus is not to optimize, but to compare.
hasan
I used to compile without optimization for all comparisons, but use of STL is kind of forcing me to use optimization.
hasan
A fair comparison is one that uses fair conditions. C++ is designed to be optimized. Non-optimized code is beneficial for speed of compiling and ease of debugging. Comparing the performance of non-optimized code isn't helpful as it doesn't reflect real world performance situations.
Charles Bailey
@LiraNuna - I'd like to see a reference for -O3 breaking code as well. The only code I've ever seen broken by -O3 was something that was wrong to begin with, but was unlucky enough to "appear" correct at lower optimization levels.
Tom
A: 

Such a comparison is less about fairness than producing useful information. You should use the optimization level that you plan to use when/if the code is put into production use. If you're basically doing research, so you don't personally plan to put it into production use, you're stuck with the slightly more difficult job of guessing what somebody who would put it into production would probably do.

Realistically, even if you are doing development, not research, you're stuck with a little of that anyway -- it's nearly impossible to predict what optimization level you might eventually use with this particular code.

Personally, I usually use -O2 with gcc. My general rule of thumb is to use the lowest level of optimization that turns on automatic inlining. I write a lot of my code with the expectation that small functions will be inlined by the compiler -- and write the code specifically to assist in doing that (e.g. often using functors instead of functions). If the compiler isn't set to produce code for those inline, you're not getting what I really intended. The performance of the code when it's compiled that way doesn't really mean anything -- I certainly would not plan on ever really using it that way.

Jerry Coffin