views:

1558

answers:

22

I was changing my for loop to increment using ++i instead of i++ and got to thinking, is this really necessary anymore? Surely today's compilers do this optimization on their own.

In this article, http://leto.net/docs/C-optimization.php, from 1997 Michael Lee goes into other optimizations such as inlining, loop unrolling, loop jamming, loop inversion, strength reduction, and many others. Are these still relevant?

What low level code optimizations should we be doing, and what optimizations can we safely ignore?

Edit: This has nothing to do with premature optimization. The decision to optimize has already been made. Now the question is what is the most effective way to do it.

anecdote: I once reviewed a requirements spec that stated: "The programmer shall left shift by one instead of multiplying by 2".

+1  A: 

Only if you know for certain that they're relevant. This means either you've investigated this issue before on your particular compiler or you've already done the following:

  1. produced functional code
  2. profiled that code
  3. identified bottlenecks
  4. simplified design to eliminate bottlenecks
  5. chosen algorithms that minimize calls into bottlenecks

If you've done all of these things then often the best thing to do is get your compiler to emit something lower level that you can examine yourself (like assembly) and make specific judgments based on that. In my experience every compiler is a little different. Sometimes an optimization on one causes another to produce larger less efficient code.

If you haven't already done these things then I'd call it premature optimization and I'd recommend against it. Optimizing before doing these things gives rewards which are disproportionately small when compared to the cost involved.

Waylon Flinn
It actually is not premature optimization when it comes to user defined increment operators (e.g. iterator operators), where i++ implies copying the original object, which might be costly.
Cătălin Pitiș
This has nothing to do with premature optimization. The decision to optimize has already been made. Now the question is what is the most effective way to do it.
Mark Beckwith
@cpitis Or it might not be, especially when you consider that you populate twice as many of the same objects with data from a database you access over a slow network. Seems better to write the code and let the execution time decide if it's an issue.
Waylon Flinn
@Mark The answer I've found most useful is just to look at the assembly emitted by your compiler. But only after I've tried everything else. You question seems to be a much more general question about the current state-of-the-art in compilers than I had originally imagined.
Waylon Flinn
Making a choice at design-time between two otherwise equivalent options, of which one is known to be faster is hardly a premature optimization. There is no cost associated with preferring ++i, so use that. Don't wait until it becomes a performance issue, because by then the optimization is no longer free (you'd have to go back and change existing code, when you could have written it correctly to begin with, in the same amount of time)
jalf
It's not premature if you *know for certain in advance* that it's an optimization and it doesn't take you any time to choose. Habits are not optimizations. As I've pointed out elsewhere in this question whether or not it's premature is at least partially a function of programmer cost.
Waylon Flinn
+2  A: 

The compiler is in a better position to judge and make such decisions. Micro optimizations you make may come into its way and ultimately miss the whole point.

dirkgently
I agree, that's why pre-optimizing is often a bad idea.
José Joel.
+1  A: 

I wanted to add a little something. This "premature optimization is bad" is kind of rubbish. What do you do when you select a algorithm? You probably take the one with the best time complexity - OMG premature optimization. Yet everyone seems fine with this. So it seems like the real attitude is "premature optimization is bad - unless you do it my way" At the end of the day do whatever you need to do to make the app you need to make.

"The programmer shall left shift by one instead of multiplying by 2". hope you dont want to multiply floats or negative numbers then ;)

Michael
When I select an algorithm or data structure, my first choice is the one that is easiest to code or use - I don't consider performance until later.
anon
You also need to remember that big-O doesn't matter for small N. But I do agree with you, the phrase needs to be clarified. Fast by design is always preferred - this phrase can be used to mean not even thinking about performance until very late, which implies that you don't even have a plan to handle performance issues.
Michael
If it's a matter of selecting an algorithm in most cases, I wouldn't call it optimization. I'd call it design. It's about the trade-off between time invested and actual speed improvements. It's also about ambiguity. You don't know exactly how every semantically equivalent syntactic construction performs until you try them and the search space is very large. On the other hand, it's easy to find complexities for common algorithms.
Waylon Flinn
ok, maybe i have just misunderstood what people ment by "optimization"i very much agree with you that "Fast by design is always preferred"
Michael
+10  A: 

Those optimizations are still relevant. Regarding your example, using ++i or i++ on a built-in arithmetic type has no effect.

In case of user defined increment/decrement operators, ++i is preferable, because it doesn't imply copying the incremented object.

So a good coding style is to use prefix increment/decrement in for loops.

Cătălin Pitiș
++i and i++ can make a pretty big difference if "i" is an stl iterator. One has to return (a copy of) the original iterator, the other doesn't.
tylerl
that's basically what he said...
Evan Teran
+9  A: 

In general, no. Compilers are much better at doing small, straightforward micro-optimizations like this across your entire code base. Ensure that you are enabling your compiler here, by compiling your release version with the right optimization flags. If you use Visual Studio, you might want to experiment with favoring size over speed (there are a lot of cases where small code is faster), link-time code generation (LTCG, which enables the compiler to do cross-compiland optimizations), and maybe even profile-guided optimization.

You also need to remember that the vast bulk of your code won't matter from a performance perspective - optimizing this code will have no user visible effect.

You need to define your performance goals early on and measure frequently to make sure you're meeting them. When outside of your goals, use tools such as profilers to determine where the hot spots are in your code and optimize those.

As another poster here mentioned, "optimization without measuring and understanding isn't optimization at all - its just random change."

If you have measured and determined that a particular function or loop is a hotspot, there are two approaches to optimize it:

  • First, optimize it at a higher level by reducing invocations of the costly code. This will usually lead to the most benefits. Algorithm level improvements fall into this level - an algorithm will a better big-O should result in running the hotspot code less.
  • If calls cannot be reduced, then you should consider micro-optimizations. Look at the actual machine code that the compiler is emitting and determine what it is doing that is the most costly - if it turns out copying temporary objects is occuring, then consider prefix ++ over postfix. If it's doing an unnecessary comparison at the beginning of the loop, flip the loop into a do/while, and so on. Without understanding why the code is slow, any blanket micro-optimizations are next to useless.
Michael
+2  A: 

Make it right, then make it fast - based on measurement of performance.

Choose algorithms well and implement them in the MOST READABLE way possible. Trade off readability for performance only when you MUST - that is when your user say that performance is unacceptable either in words or by their actions.

As Donald Knuth/Tony Hoare said "premature optimization is the root of all evil" - still true now 30 years later...

Gordon Guthrie
'by their actions' can be too late if their actions are to abandon you for a competitor's product, as happened to SGI a few years ago.
Pete Kirkham
Sure - but 'by their actions proactively' - release software to users and examine their actions and act on it. Performance measurements are always people-orientated and should never be technically orientated - user-perceived response times are what matter.
Gordon Guthrie
+2  A: 

Last time I tested ++it and it++ on the Microsoft C++ compiler for STL iterators, ++it emitted less code, so if you're in a massive loop you may get a small performance gain using ++it.

For integers etc the compiler will emit identical code.

+6  A: 

All of the optimizations you listed are practically irrelevant these days for C programmers -- the compiler is much, much better at performing things like inlining, loop unrolling, loop jamming, loop inversion, and strength reduction.

Regarding ++i versus i++: for integers, they generate identical machine code, so which one you use is a matter of style/preference. In C++, objects can overload those pre- and postincrement operators, in which case it's usually preferable to use a preincrement, because a postincrement necessitates an extra object copy.

As for using shifts instead of multiplications by powers of 2, again, the compiler already does that for you. Depending on the architecture, it can do even more clever things, such as turning a multiplication by 5 into a single lea instruction on x86. However, with divisions and moduli by powers of 2, you might need to pay a little more attention to get the optimal code. Suppose you write:

x = y / 2;

If x and y are signed integers, the compiler can't turn that into a right shift because it will yield an erroneous result for negative numbers. So it, emits a right shift and some bit twiddling instructions to make sure the result is correct for both positive and negative numbers. If you know x and y are always positive, then you should help the compiler out and make them unsigned integers instead. Then, the compiler can optimize it into a single right shift instruction.

The modulus operator % works similarly -- if you're modding by a power of 2, with signed integers the compiler has to emit an and instruction plus a little more bit twiddling to make the result correct for positive and negative numbers, but it can emit a single and instruction if dealing with unsigned numbers.

Adam Rosenfield
A: 

As others have said, ++i may be more efficient than i++ if i is an instance of some object. That difference may or may not be significant to you.

However, in the context of your question about whether the compiler can do these optimisations for you, in your chosen example it can't. The reason is that ++i and i++ have different meanings - which is why they are implemented as different functions. i++ has to do extra work (to copy the current state before incrementing, do the increment, then return that state). If you don't need that extra work, then why choose that form other the more direct form? The answer might be readability - but it has become idiomatic in C++ to write ++i in these case, so I don't believe readability comes into it.

So, given the choice between writing code that does unnecessary extra work (which may or may not be significant), with no benefit in itself, I would always choose the more direct form. This is not a premature optimisation. On the other hand it's not usually significant enough to get religious about either.

Phil Nash
+21  A: 

This is a well-worn subject, and SO contains oodles of good and bad advice on it.

Let me just tell you what I have found from much experience doing performance tuning.

There are tradeoff curves between performance and other things like memory and clarity, right? And you would expect that to get better performance you would have to give something up, right?

That is only true if the program is on the tradeoff curve. Most software, as first written, is miles away from the tradeoff curve. Most of the time, it is irrelevant and ignorant to talk about giving up one thing to get another.

The method I use is not measurement, it is diagnosis. I don't care how fast various routines are or how often they are called. I want to know precisely which instructions are causing slowness, and why.

The dominant and primary cause of poor performance in good-sized software efforts (not little one-person projects) is galloping generality. Too many layers of abstraction are employed, each of which extracts a performance penalty. This is not usually a problem - until it is a problem - and then it's a killer.

So what I do is tackle one issue at a time. I call these "slugs", short for "slowness bugs". Each slug that I remove yields a speedup of anywhere from 1.1x to 10x, depending on how bad it is. Every slug that is removed makes the remaining ones take a larger fraction of the remaining time, so they become easier to find. In this way, all the "low-hanging fruit" can be quickly disposed of.

At that point, I know what is costing the time, but the fixes may be more difficult, such as partial redesign of the software, possibly by removing extraneous data structure or using code generation. If it is possible to do this, that can set off a new round of slug-removal until the program is many times not only faster than it was to begin with, but smaller and clearer as well.

I recommend getting experience like this for yourself, because then when you design software you will know what not to do, and you will make better (and simpler) designs to begin with. At the same time, you will find yourself at odds with less experienced colleagues who can't begin thinking about a design without conjuring up a dozen classes.

ADDED: Now, to try to answer your question, low-level optimization should be done when diagnosis says you have a hot-spot (i.e. some code at the bottom of the call stack appears on enough call stack samples (10% or more) to be known to be costing significant time). AND if the hot-spot is in code you can edit. If you've got a hot-spot in "new", "delete", or string-compare, look higher up the stack for things to get rid of.

Hope that helps.

Mike Dunlavey
It doesn't help. Its a very nice answer, just like Neil's, but it has nothing to do with the question I asked. See jalf's answer below for more of what I was looking for.
Mark Beckwith
Sorry. I guess you hit my "general" button. I meet way too many programmers who worry about things like ++i, or compiler optimization, or speed of virtual functions, when their design is causing 30x slowdown.
Mike Dunlavey
No prob. It looks like I hit a lot of those "general" buttons :-) Had I known I would have phrased the question differently.
Mark Beckwith
Phrasing questions on SO is an art. Your question is phrased a lot better than a couple I've done, and gotten smacked for it.
Mike Dunlavey
+6  A: 

Yes, those things are still relevant. I do a fair bit of that kind of optimization but, to be fair, I'm mostly writing code that has to do relatively complex things in about 10ms on an ARM9. If you're writing code that's running on more modern CPUs then the benefits won't be as great.

If you don't care about portability, and you're doing a fair bit of maths, then you might also look at using whatever vector operations are available on your target platform - SSE on x86, Altivec on PPC. Compilers can't easily use these instructions without a lot of help, and the intrinsics are pretty easy to use these days. Another thing that's not mentioned in the document you linked to is pointer aliasing. You can sometimes get good speed improvements if your compiler has support for some kind of "restrict" keyword. Plus, of course, thinking about cache usage can be important. Reorganizing your code and data in a way that makes good use of the cache can result in dramatic speed increases compared to optimizing away the odd copy or unrolling a loop.

As ever, though, the most important thing is to profile. Only optimize code that's actually slow, make sure your optimization actually makes it quicker, and look at the disassembly to see what optimizations the compiler is already doing for you before you try to improve it.

James Sutherland
+5  A: 

Bad example – the decision whether to use ++i or i++ doesn't involve any kind of trade-off! ++i has (may have) a net benefit without any downsides. There are many similar scenarios and any discussion in these realms are a waste of time.

That said, I believe it's very important to know to what extent the target compiler is capable of optimizing small code fragments. The truth is: modern compilers are (sometimes surprisingly!) good at it. Jason has an incredible story concerning an optimized (non-tail recursive) factorial function.

On the other hand, compilers can be surprisingly stupid as well. The key is that many optimizations require a control flow analysis which becomes NP complete. Ever optimization thus becomes a trade-off between compilation time and usefulness. Often, the locality of an optimization plays a crucial role because the computation time required to perform the optimization increases just too much when the code size regarded by the compiler increases by just a few statements.

And as others have said, these minute details are still relevant and always will be (for the forseeable future). Although compilers get smarter all the time and machines get faster, so does the size of our data grow – in fact, we're losing this particular battle; in many fields, the amount of data grows much faster than computers get better.

Konrad Rudolph
++ The bee in my bonnet is low-level and compiler optimization DOESN'T MATTER unless you're coding up an authentic hotspot - i.e. a loop without function calls that takes a significant % of time. And algorithms? Fine, but that's not enough. The REAL killer is layers of "general" complexity that lead your poor program counter on endless nested side-trip goose-chases.
Mike Dunlavey
+1  A: 

Sure, if and only if it results in an actual improvement for that particular program which is significant enough to be worth the coding time, any readability decrease, etc. I don't think you can make a rule for this across all programs, or really for any optimization. It's entirely dependent on what actually matters in the particular case.

With something like ++i, the time and readability tradeoff is so minor, it may well be worth making a habit of, if it actually results in improvement.

Kim Reece
+10  A: 

If there is no cost to the optimization, do it. When writing the code, ++i is just as easy to write as i++, so prefer the former. There is no cost to it.

On the other hand, going back and making this change afterwards takes time, and it most likely won't make a noticeable difference, so you probably shouldn't bother with it.

But yes, it can make a difference. On built-in types, probably not, but for complex classes, the compiler is unlikely to be able to optimize it away. The reason for this is that the increment operation no is no longer an intrinsic operation, built into the compiler, but a function defined in the class. The compiler may be able to optimize it like any other function, but it can not, in general, assume that pre-increment can be used instead of post-increment. The two functions may do entirely different things.

So when determining which optimizations can be done by the compiler, consider whether it has enough information to perform it. In this case, the compiler doesn't know that post-increment and pre-increment perform the same modifications to the object, so it can not assume that one can be replaced with the other. But you have this knowledge, so you can safely perform the optimization.

Many of the others you mention can usually be done very efficiently by the compiler: Inlining can be done by the compiler, and it's usually better at it than you. All it needs to know is how large a proportion of the function consists of function call over head, and how often is it called? A big function that is called often probably shouldn't be inlined, because you end up copying a lot of code, resulting in a larger executable, and more instruction cache misses. Inlining is always a tradeoff, and often, the compiler is better at weighing all the factors than you.

Loop unrolling is a purely mechanic operation, and the compiler can do that easily. Same goes for strength reduction. Swapping inner and outer loops is trickier, because the compiler has to prove that the changed order of traversal won't affect the result, which is difficult to do automatically. So here is an optimization you should do yourself.

But even in the simple ones that the compiler is able to do, you sometimes have information your compiler doesn't. If you know that a function is going to be called extremely often, even if it's only called from one place, it may be worth checking whether the compiler automatically inlines it, and do it manually if not.

Sometimes you may know more about a loop than the compiler as well (for example, that the number of iterations will always be a multiple of 4, so you can safely unroll it 4 times). The compiler may not have this information, so if it were to inline the loop, it would have to insert an epilog to ensure that the last few iterations get performed correctly.

So such "small-scale" optimizations can still be necessary, if 1) you actually need the performance, and 2) you have information that the compiler doesn't.

You can't outperform the compiler on purely mechanical optimizations. But you may be able to make assumptions that the compiler can't, and that is when you're able to optimize better than the compiler.

jalf
+1  A: 
  • I don't generally optimize lower than the O(f(n)) complexity unless I'm writing on an embedded device.

  • For typical g++/Visual Studio work, I presume that the basic optimizations are going to be reliably made(at least when optimization is requested). For less mature compilers, that assumption is presumably not valid.

  • If I was doing heavy maths work on streams of data, I'd check the compiler's ability to emit SIMD instructions.

  • I'd rather tune my code around different algorithms than a specific version of a specific compiler. Algorithms will stand the test of multiple processors/compilers, whereas if you tune for the 2008 Visual C++(first release) version, your optimizations may not even work next year.

  • Certain optimization tricks that are very reasonable in older computers prove to have issues today. E.g., the ++/++ operators were designed around an older architecture that had an increment instruction that was very fast. Today, if you do something like

    for(int i = 0; i < top; i+=1)

    I would presume that the compiler would optimize i+=1 into an inc instruction(if the CPU had it).

  • The classic advice is to optimize top-down.

Paul Nathan
+2  A: 

An interesting observation I've had over the years is that optimized code of one generation back seems to actually be counter-optimized in the following generation. This is i.e. due to processor implementations changing so that if/else becomes a bottleneck in modern CPUs, where pipelines are deep. I would say clean, short, concise code is often the best end result. Where optimization really counts is in the data structures, to get them right and slim.

akauppi
+1  A: 

There are three quotes I believe that every developer should know with regard to optimization - I first read them in Josh Bloch's "Effective Java" book:

More computing sins are committed in the name of efficiency (without necessarily achieving it) than for any other single reason - including blind stupidity.

(William A. Wulf)

We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.

(Donald E. Knuth)

We follow two rules in the matter of optimization:

Rule 1: Don't do it.

Rule 2: (for experts only). Don't do it yet - that is, not until you have a perfectly clear and unoptimized solution.

(M. A. Jackson)

All these quotes are (AFAIK) at least 20-30 years old, a time where CPU and memory meant much more than today. I believe is right way to develop software is first to have a working solution, and then use a profiler to test where are the performance bottlenecks. A friend once told me about an application that was written in C++ and Delphi, and had performance issues. Using a profiler they found out that the application spent a considerable amount of time converting strings from Delphi's structure to the C++ one and vice versa - no micro optimization can detect that...

To conclude, don't think that you know where the performance issues will be. Use a profiler for this.

David Rabinowitz
A: 

First of all - always run profiling to check.

Firstly if you optimalizing right part of code. If the code is run by 1% of total time - forget. Even if you spped it up by 50% you gane whole 0.5% total speedup. Unless you are doing something strange speedup will be much slower (especially if you did used good optimalizing compiler). Secondly if you optimalizing it right. Which code would run faster on x86?

inc eax

or

add eax, 1

Well. As far as I know in earlier processors the first one but on P4 the second one (it is irrelevant here if those specific instructions are run faster or slower the point is that it changes all the time). The compiler may be up-to-date with such changes - you will not.

In my opinion the primary target is the optimalizing that cannot be performed by compilator - as mentioned earlier the data size (you may think that it is not needed on nowadays 2 GiB computers - but if your data is bigger then processor cache - it will run much slowler).

In general - do it only if you must and/or you know what you are doing. It will require an amount of knowledge about the code, compiler and the low-level computer architecture that is not metioned in the question (and to be honest - I do not posess). And it will likely gain nothing. If you want to optimize - do it on more highier level.

Maciej Piechotka
A: 

Certainly yes, because the compiler needs more resources to optimize not optimized code than to optimize something already optimized. In particular, it causes the computer to consume a little bit more energy, that, despite being small, still causes bad impact on the already hurt nature. This is especially important for open-source code, which is compiled more often than closed source.

Go green, save the planet, optimize yourself

dmityugov
+2  A: 

Don't try to guess what your compiler is doing. If you've already determined that you need to optimize something at this level, isolate that bit and look at the generated assembly. If you can see that the generated code is doing something slow that can be improved, by all means fiddle with it at a code level and see what happens. If you really need control, rewrite that bit in assembly and link it in.

This is a pain in the ass, but the only way to really see what is happening. Be aware that all such tight optimizations may become useless once you change anything (different CPU, different compiler, even different cache, etc.) and it's a sunk cost.

simon
+2  A: 

One also needs to be careful that changing from pre/post- increment/decrement operators doesn't introduce an undesirable side effect. For example, if you're iterating over a loop 5 times simply to run a set of code multiple times without any interest in the loop index value, you're probably okay (YMMV). On the other hand if you do access the loop index value than the result may not be what you expect:

#include <iostream>

int main()
{
  for (unsigned int i = 5; i != 0; i--)
    std::cout << i << std::endl;

  for (unsigned int i = 5; i != 0; --i)
    std::cout << "\t" << i << std::endl;

  for (unsigned int i = 5; i-- != 0; )
    std::cout << i << std::endl;

  for (unsigned int i = 5; --i != 0; )
    std::cout << "\t" << i << std::endl;
}

results in the following:

5
4
3
2
1
        5
        4
        3
        2
        1
4
3
2
1
0
        4
        3
        2
        1

The first two cases show no difference, but notice that attempting to "optimize" the fourth case by switching to a pre-decrement operator would result in an iteration being completely lost. Admittedly this is a bit of a contrived case but I have seen this sort of loop iteration (3rd case) when going through an array in reverse order, i.e. from end to start.

Void
I know you're making a valid point, but you're probably not the only one still talking about whether ++i or i++ is faster, in a loop containing `std::cout` :-)
Mike Dunlavey
... I just took 5 stackshots of your code. One was in `operator<< -> writepad -> strlen`. The other 4 were in `operator<< -> endl -> operator<< -> flush -> ostream::flush -> filebug::sync -> _write -> KERNEL32! -> KERNEL32! -> KERNEL32! -> NTDLL!` So far, at least, the `--i` operators are not on the radar.
Mike Dunlavey
@Mike Dunlavey: We're talking about two different things here. My point was about potential side effects of blindly changing post-increment and post-decrement operators to pre-increment and pre-decrement operators, respectively. I wasn't making any claims about which operators are faster.
Void
A: 

I still do things like ra<<=1; instead of ra*=2; And will continue to. But the compilers (as bad as they are) and more importantly the speed of the computers is so fast that these optimizations are often lost in the noise. As a general question, no it is not worth it, if you are specifically on a resource limited platform (say a microcontroller) where every extra clock really counts then you probably already do this and probably do a fair amount of assembler tuning. As a habit I try not to give the compiler too much extra work, but for code readability and reliability I dont go out of my way.

The bottom line for performance though has never changed. Find some way to time your code, measure to find the low hanging fruit and fix it. Mike D. hit the nail on the head in his response. I have too many times seen people worry about specific lines of code not realizing that they are either using a poor compiler or by changing one compiler option they could see several times increase in execution performance.

dwelch