views:

624

answers:

8

Using the latest gcc compiler, do I still have to think about these types of manual loop optimizations, or will the compiler take care of them for me well enough?

+21  A: 

Write the code, profile it, and only think about optimising it when you have found something that is not fast enough, and you can't think of an alternative algorithm that will reduce/avoid the bottleneck in the first place.

With modern compilers, this advice is even more important - if you write simple clean code, the compiler's optimiser can often do a better job of optimising the code than it can if you try to give it snazzy "pre-optimised" code.

Jason Williams
Plus one. Perfect answer. We should focus on doing our job (write correct, readable, maintainable code) and let the compiler do its job (know the platform and produce semantically correct efficient code for that platform).
Jason
Also, don't forget to profile **after** optimization as well, to make sure that your optimizations are indeed optimizations.
JesperE
Loop hoisting is the moving of invariants from inside the loop to outside the loop. This is a good checklist item for code reviews; whether personal or team. I don't depend on compilers for this kind of optimization. This kind of optimization reduced my program's execution time since it was checksumming 2GB of data.
Thomas Matthews
@Thomas: Indeed. You have demonstrated a perfect case where you knew a loop was too slow and optimised it *because you knew it to be necessary* (You had evidence of some form to prove that optimisation was needed). Now go and count how many loops in your program could be hoisted/optimised with no net gain to the end user.
Jason Williams
I would like to think the "pre-optimization is the root", and "profile first" mantra has been beaten enough to death on this forum that it's a given, that perhaps the person has already considered this, that maybe the person simply wants his question answered about compiler features and quality rather than debated about it's necessity.
Will Hartung
-1 for another bad-faith answer to an optimization question. Once again the premature optimization parrots come out and get voted up because no one on StackOverflow bothers thinking about performance.
tgamblin
@tgamblin: true, but IMO the questioner asked for it, by saying "do I have to think about these types of optimization?". Since the question contains no grounds to suspect that the questioner should be thinking about it, you get this answer. I once asked a question about a specific GCC optimisation issue. I didn't get much attention at all, but I did get answers and I got no generic waffle about how I should be developing my code. So it is possible (or used to be).
Steve Jessop
@Steve Jessop: That doesn't negate the fact that this reflexive non-answer has been repeated on here a million times. I have no problem with telling people to profile because it's a great rule of thumb. But look at, say, Norman's answer below. It mentions the need to profile but actually *teaches the OP something*. It's also the only answer with a concrete explanation of when you should care and when you shouldn't. This answer is just regurgitation and vagaries, and it gets upvoted by all the people who prefer their best practices spoon-fed.
tgamblin
@tgamblin: That doesn't negate the fact that hordes of people continue to ask pointless questions about micro-optimization. I like to think of myself as something of a professional. That means that, when somebody asks a dumb question I feel compelled to explain why it's a dumb question. I believe in disregarding micro-optimization until shown necessary about as deeply as I believe in meaningful variable names.
David Thornley
@David Thornley: It's not a micro-optimization if it's in a tight loop that you've actually discovered matters for performance. Should we have everyone who asks a performance question prove its necessity by having a committee review their motives? Even if the OP is prematurely optimizing, an informative answer would benefit someone who *isn't*. I'm sorry you've had to explain to so many people why their questions are dumb, but IMHO you should stop presuming peoples' motives. I believe in respecting people more deeply even than I believe in meaningful variable names.
tgamblin
@tgamblin: The question was "do I still have to think about these types [loop-invariant code motion] of manual loop optimizations?" not "what do I think about when looking for loop-invariant code motions?"
Jason
It's also not micro-optimization if it's in a loose loop, but your experience of your compiler is that it will make a total horlicks of the simplest expression of the logic. It took me all of one guess to come up with a loop that a specific compiler, that I've optimized code for before, couldn't hoist. Trust me, I didn't even compile the code before my first loop, let alone profile it. A profiler is not required to find bone-headed performance goofs, and the advice to profile first should pre-suppose that the code you write isn't utter rubbish in the first place.
Steve Jessop
... to stop it being utter rubbish in the first place, you have to think a little bit about performance, and have at least a vague idea what kinds of optimization compilers can (and can't) perform. I don't think programmers should waste a lot of time guessing what the fastest possible implementation is of some routine, but I do think they should code with certain basic knowledge of what their code will actually do.
Steve Jessop
As many have noticed, I did not answer the unasked question "what types of hoisting will a compiler do for me, and where will it fail?", I answered the OP: "should I worry about hoisting?", to which the correct answer is "not until you know the loop needs to be optimised". He knows what hoisting is (so I don't need to explain that), and by following my advice, the OP can try his loop with/without hoisting to see for himself which specific changes are beneficial with his particular code/compiler/platform combo. Which is a more useful programming skill than knowing "gcc version xyz does this"
Jason Williams
-1, too, for similar reasons
peterchen
This answer isn't wrong. If you feel it isn't complete, then why not add your own answer to add to the body of knowledge? That seems to me to be the point of SO. I've upvoted the other good answers.
Jason Williams
@Jason: your interpretation of the OP's question is disingenuous. He asked if he should _still_ worry about hoisting, not if he should worry about it. You were answering an unasked question as well.
int3
No, I answered "should I still worry about it" with the answer "no, not until you have proof that it runs too slowly". Most obvious hoisting cases we will automatically handle - we won't deliberately write inefficient code. So you shouldn't spend time *considering* whether extra optimisations are needed until you have some *proof* that it's worth the bother. Why are so many people unable to understand this simple simple point? The accepted answer starts with the same point, and nobody has hassled him about it. If you don't know if hoisting will help, TRY IT and LEARN something.
Jason Williams
A: 

Remember 80-20 Rule.(80% of execution time is spent on 20% critical code in the program) There is no meaning in optimizing the code which have no significant effect on program's overall efficiency.

One should not bother about such kind of local optimization in the code.So the best approach is to profile the code to figure out the critical parts in the program which consumes heavy CPU cycles and try to optimize it.This kind of optimization will really makes some sense and will result in improved program efficiency.

Ashish
80/20? My experience is that it is 98% and 2%. (Also, that's not my downvote.)
wallyk
No, the 80-20 rule is that 80% of the time that someone has to invent a percentage with no basis whatsoever in observation or statistics, they will choose 80%. The other 20% of the time, they will choose 20%.
Steve Jessop
80/20 rule is a common name for a pareto distribution.
peterchen
++ I got clobbered for saying what I thought was the same thing :-), maybe less nicely. No doubt I'm old-fashioned, but I spent years writing assembly language, and what I want a compiler to do is write good assembly language for me. I don't want it to try to outsmart me and second-guess the structure of my program. If it thinks such changes will make my code faster, almost 100% of the time it won't, and anyway I would like such decisions to be mine.
Mike Dunlavey
+4  A: 

Compilers generally do an excellent job with this type of optimization, but they do miss some cases. Generally, my advice is: write your code to be as readable as possible (which may mean that you hoist loop invariants -- I prefer to read code written that way), and if the compiler misses optimizations, file bugs to help fix the compiler. Only put the optimization into your source if you have a hard performance requirement that can't wait on a compiler fix, or the compiler writers tell you that they're not going to be able to address the issue.

Stephen Canon
+1 for the focus on readability. I totally agree, and it seems very reasonable to assume that code that is easier for a human to understand has a better chance at being well optimized.
Peeter Joot
+22  A: 

If your profiler tells you there is a problem with a loop, and only then, a thing to watch out for is a memory reference in the loop which you know is invariant across the loop but the compiler does not. Here's a contrived example, bubbling an element out to the end of an array:

for ( ; i < a->length - 1; i++)
    swap_elements(a, i, i+1);

You may know that the call to swap_elements does not change the value of a->length, but if the definition of swap_elements is in another source file, it is quite likely that the compiler does not. Hence it can be worthwhile hoisting the computation of a->length out of the loop:

int n = a->length;
for ( ; i < n - 1; i++)
    swap_elements(a, i, i+1);

On performance-critical inner loops, my students get measurable speedups with transformations like this one.

Note that there's no need to hoist the computation of n-1; any optimizing compiler is perfectly capable of discovering loop-invariant computations among local variables. It's memory references and function calls that may be more difficult. And the code with n-1 is more manifestly correct.

As others have noted, you have no business doing any of this until you've profiled and have discovered that the loop is a performance bottleneck that actually matters.

Norman Ramsey
+1, excellent example
Adam Rosenfield
+1 for answering the question, actually teaching the OP something in the process, and not beating the asker over the head with regurgitated advice. I wish I could upvote this more.
tgamblin
it's okay, tgamblin, I upvoted it for you.
fennec
+5  A: 

Check the generated assembly and see for yourself. See if the computation for the loop-invariant code is being done inside the loop or outside the loop in the assembly code that your compiler generates. If it's failing to do the loop hoisting, do the hoisting yourself.

But as others have said, you should always profile first to find your bottlenecks. Once you've determined that this is in fact a bottleneck, only then should you check to see if the compiler's performing loop hoisting (aka loop-invariant code motion) in the hot spots. If it's not, help it out.

Adam Rosenfield
+1  A: 

Where they are likely to be important to performance, you still have to think about them.

Loop hoisting is most beneficial when the value being hoisted takes a lot of work to calculate. If it takes a lot of work to calculate, it's probably a call out of line. If it's a call out of line, the latest version of gcc is much less likely than you are to figure out that it will return the same value every time.

Sometimes people tell you to profile first. They don't really mean it, they just think that if you're smart enough to figure out when it's worth worrying about performance, then you're smart enough to ignore their rule of thumb. Obviously, the following code might as well be "prematurely optimized", whether you have profiled or not:

#include <iostream>

bool isPrime(int p) {
    for (int i = 2; i*i <= p; ++i) {
        if ((p % i) == 0) return false;
    }
    return true;
}

int countPrimesLessThan(int max) {
    int count = 0;
    for (int i = 2; i < max; ++i) {
        if (isPrime(i)) ++count;
    }
    return count;
}

int main() {
    for (int i = 0; i < 10; ++i) {
        std::cout << "The number of primes less than 1 million is: ";
        std::cout << countPrimesLessThan(1000*1000);
        std::cout << std::endl;
    }
}

It takes a "special" approach to software development not to manually hoist that call to countPrimesLessThan out of the loop, whether you've profiled or not.

Steve Jessop
I don't think it's "special" at all. It is very simple: "The code becomes simpler, and uses fewer lines, if I just call the function where it is needed. And I trust the compiler to hoist out the call for me". I don't see the point in obfuscating your code if you don't know that it's necessary. Of course, it does depend on how and where countPrimesLessThan is defined. If the definition is not visible to the compiler then it can't assume anything about the function, and it won't be able to safely move it outside the loop. But if the definition is visible, trust the compiler.
jalf
Well this is not 100% correct. My countPrimesLessThan() might use memoization, and thus only incur the call cost once. So its really only valid if you _know_ what you're doing is expensive. And you might have to undo it later to do refactoring
Michael Anderson
@OP: On the contrary - we believe that people should profile first, before microoptimizing. You can always go through a piece of code and microoptimize (and then test to see if it helps); what you won't know is where to do it before profiling. If the performance is inadequate, and you're working in a hot spot, knock yourself out.
David Thornley
Well, at risk of spoiling an excellent theoretical discussion with actual facts, g++ 4.3.2 does not hoist countPrimesLessThan with -O3 and everything defined in a single translation unit. So if you trust your compiler, your code is 10 times slower than mine. And the problem with this kind of "special" programming is that if you deliberately write everything to be absurdly slow, nothing will stand out as being particularly hot. This code will profile as spending all its time in countPrimesLessThan either way, whether it's called 1 time or 10.
Steve Jessop
@Michael: true, it might memoize. But unfortunately, the same crew who instructed me not to hoist the call out of the loop, also bullied me into never memoizing anything, because that's premature optimization too. So as it happens, it doesn't memoize, and I've edited to show my implementation. So unless I'm very lucky, and countPrimesLessThan was written by someone smart enough to ignore the wisdom of the crowds, my code is still 10 times slower than it should be.
Steve Jessop
@Steve: If I trust my compiler, and my code turns out to be 10 times slower than expected, then finding and fixing the problem is pretty easy. Run the program in the debugger, pause it at some random time, look at the call stack, and voila, we have a function call inside a loop which could be hoisted out. Which would take me another 2 seconds to do. Or we could note that all the time is being spent inside the function, and rewrite it to use memoization.In short, I don't see the problem. Some optimizations are difficult to apply after the fact. Loop hoisting is not one of them.
jalf
But I think you have another interesting point about premature optimization. If the function uses memoization, then hoisting *doesn't matter*. And if we hoist it out of the loop, then memoizing the function *doesn't matter*. So you don't know that you'll gain anything by hoisting as soon as you write the loop. It might be a waste of time, and then why not settle for the cleanest version of the code? (Btw, out of interest I compiled it with MSVC, and turns out it does hoist the function call out (without inlining the function)
jalf
"10 times slower than expected" - but how do you know how long it's expected to take? As it happens, the loops in isPrime and countPrimesLessThan could also benefit from a little work (I estimate they could each be roughly tripled in speed without much effort or complication, leading to another x10 performance gain doing both of them). I don't even necessarily agree that the "special/simple" code is clearer - it's shorter, but since the spec is to print out the count 10 times, I don't believe it's unnatural to think of that as "Get the result. Print it 10 times."
Steve Jessop
@Steve: "but how do you know how long it's expected to take?". You don't need to know. Just take stackshots (http://stackoverflow.com/questions/406760/whats-your-most-controversial-programming-opinion/1562802#1562802) and you will see that it is doing unnecessary work, and what needs to be done to fix it. As jalf said, it's easy, and (IMHO) the method he said is not only the simplest but the best.
Mike Dunlavey
That tells you where all the time is spent. But as I've said already, where all the time is spent is the same, whether you call the slow function once or 10 times. It's spent *in the slow function*. Which we already knew, because that function is obviously slow compared with anything else in the program. Since the stackshot results will be identical whether the call is hoisted or not, I don't see how stackshots can tell you that the call needs to be hoisted. In short, stackshots and profilers don't tell you what's doing unnecessary work, they only tell you what's doing work.
Steve Jessop
@Steve. Maybe I'm missing the point, but I've done this many times. I've got *really slow function* F (not memoized), called 100 times in the loop. Stackshots show it. They show the call, in the loop. I look and say "that time consuming call is in the loop - that's silly" so I move it out for a big speedup. Now my shots still show the function, but the program completes in 1/100 the time. No?
Mike Dunlavey
Or you move it out of the loop and nothing happens, because the compiler was already hoisting it. My point is that the stackshots can't tell you that the function is called too many times, they can only tell you the blindingly obvious, which is that the function is slow. The call stack is exactly the same for each of the calls (unless it includes the values of all variables at all levels, I guess, and you spot that the loop counter is changing). Is that how yuo'd tell from the stackshots whether the function is actually called 100 times or 1? Or am I missing something obvious?
Steve Jessop
"I look and say "that time consuming call is in the loop - that's silly"". The main point of my example is just that we can say it's silly with or without stackshots or a profiler. It's silly when we write the code that way, and it remains silly until we fix it. *Unless* the compiler hoists, in which case I've wasted my time writing it my way in the first place, and unless I've missed something, you waste time during the optimisation process testing the same optimisation I made at initial coding time. I don't see why your way is better.
Steve Jessop
@Steve: Yes. That's the point. You can examine the call sites. In my experience, very few problems are at the "bottom" of the stack. Usually the program is in the middle of some system call, memory allocation, I/O, or some incredible nonsense deep in an abstraction layer. But what I can *do* about it is examine each call site and ask "could I do without this?" or "could I do this less often?". That's where the savings come.
Mike Dunlavey
... Let's assume the compiler didn't hoist the call, just to eliminate one variable. Consider the manually hoisted call versus the non-hoisted call. Are the stackshots different? Yes. Because you look at the source line where the function is called, and you see that it is outside the loop or inside. You know you should look at that call site because it appears on every, or nearly every, stackshot. If you see that it is inside the loop, it is practically begging you to find a way to call it less, such as by hoisting it.
Mike Dunlavey
Sorry, I just don't understand why the code obviously requires manual hosting once you've taken the stackshots, but didn't obviously require manual hoisting when you wrote it. What have stackshots added in this example? My program does one thing which might be slow, and I correctly predicted when I wrote it that it would be slow. Counting prime numbers, is slow. Why should I pretend that it might not be slow until I've wasted time fiddling with a debugger, that doesn't tell me anything I don't already know?
Steve Jessop
Well, when you write it, you can't depend on knowing it will be a problem. I mean like, if something else is taking 99% of the time, why sweat it? When I tune code, I have no idea where the problems will be. I don't even need to guess, because the stackshots will tell me "Fix THIS, you clown!" :-) Example: http://stackoverflow.com/questions/926266/performance-optimization-strategies-of-last-resort/927773#927773
Mike Dunlavey
The example isn't random, my point was to say that there are programs where you know before you run it where at least some of the problems are going to be, and that it's just dumb not to address them when you write the code. If a programmer has no idea at all of the likely runtime of his algorithms, compared against printing a short string, then OK, he needs to profile (or take stackshots) before he optimizes. But I'm not that programmer, and I strongly suspect you aren't either :-) Furthermore, I'm not convinced that this hypothetical incredibly dim programmer would realise he needs to hoist.
Steve Jessop
... In your example, essentially 100% of shots will be of the form: `main: line 21, countPrimesLessThan: line 13, isPrime: line 5`. So every one of those lines costs practically 100%, so any one you could "fix" would save you massive. Can you fix 5?, not really. Can you fix 13? Not really. Can you fix 21? Well, it's inside a loop. Is it repeating itself? Why, yes, it is. So that's your bug. Move it out.
Mike Dunlavey
Mike Dunlavey
Sure, I understand that, if you came across this crummy code and took stackshots, you'd fix it. I also understand how useful they are with code where you really can't tell what's going to be slow, and will have to do some research and experimentation. What I don't understand is why you can't (won't?) make the same fix in this example just by code inspection. Specifically, inspecting the code as you type it, and/or designing for performance before you type anything. Still, thanks for the thoughts, I'll let them percolate for a while.
Steve Jessop
Your example was constructed to be a worst case. If I'm coding and I've the least suspicion something will be a problem, I do the common-sense thing, obviously. But even if I'm the only programmer (which I'm not), and despite the best intentions, the code ends up having a number of performance problems, so I rely on stackshots to find them, which it does.
Mike Dunlavey
A: 

A good rule of thumb is usually that the compiler performs the optimizations it is able to. Does the optimization require any knowledge about your code that isn't immediately obvious to the compiler? Then it is hard for the compiler to apply the optimization automatically, and you may want to do it yourself

In most cases, lop hoisting is a fully automatic process requiring no high-level knowledge of the code -- just a lot of lifetime and dependency analysis, which is what the compiler excels at in the first place.

It is possible to write code where the compiler is unable to determine whether something can be hoisted out safely though -- and in those cases, you may want to do it yourself, as it is a very efficient optimization.

As an example, take the snippet posted by Steve Jessop:

for (int i = 0; i < 10; ++i) {
    std::cout << "The number of primes less than 1 billion is: ";
    std::cout << countPrimesLessThan(1000*1000*1000);
    std::cout << std::endl;
}

Is it safe to hoist out the call to countPrimesLessThan? That depends on how and where the function is defined. What if it has side effects? It may make an important difference whether it is called once or ten times, as well as when it is called. If we don't know how the function is defined, we can't move it outside the loop. And the same is true if the compiler is to perform the optimization.

Is the function definition visible to the compiler? And is the function short enough that we can trust the compiler to inline it, or at least analyze the function for side effects? If so, then yes, it will hoist it outside the loop.

If the definition is not visible, or if the function is very big and complicated, then the compiler will probably assume that the function call can not be moved safely, and then it won't automatically hoist it out.

jalf
I've added to my answer the implementation I used to test whether my compiler is capable of this simple hoist (it isn't). It's pretty obvious to me that my functions have no side-effects, but apparently not to my compiler. Perhaps it doesn't want to be accused of cheating on benchmarks, and thinks my loop is so stupid it can only be a benchmark ;-) Or perhaps it's not as aggressive as your description suggests.
Steve Jessop
@jalf, @Steve: I know this example is contrived in order to make a point, but I can't think of a realistic case where I want to write something inside a loop and expect the compiler to move it out. If I wrote it inside the loop *by accident* then I'm being stupid, and the idea that a compiler is going to be smarter and wipe up after me is disturbing. It's a mantra that compilers optimize better than people do, but it gets repeated without being justified.
Mike Dunlavey
@Mike: I agree, compilers don't optimize better than people do *in general*. In certain cases they do, but certainly not every kind of optimization. But I'd argue that if you rely on a function call like this to be hoisted out of the loop, and the compiler doesn't do it, then it should be pretty easy to find out and then do it manually. It is such a simple and effective optimization either way that *if* the compiler doesn't do it for you, it's easy enough to manually remedy it. It's not one of those optimizations that *has* to be designed into the app from day 1.
jalf
@jalf: I guess the reason I get so exercised over it is that it's symptomatic of a broader lack of understanding of performance in general. I've heard it from compiler writers for decades that such optimizations accumulate and generally make the code "faster" in a way that's hard to argue with. But my 30 years of performance tuning says it has 100% - epsilon to do with what the *programmer* does and almost nothing to do with the compiler, **except** when it comes to optimizing expressions and assignment statements, register allocation, etc.
Mike Dunlavey
I think much the same as Mike about broader understanding - if you ignore performance while writing code, there's a steady drip-drip: "this decision saves me a second of thought or typing, and probably halves the speed". "Same again". "This one divides it by four". OK, so you can fix them later, like any code defect, but the ideal is "works first time". So to me it's a good trade-off to anticipate the predictable defects, as long as it's not a great contortion. I don't bother speculatively hoisting vector::size(), because I know that's fast. countPrimesLessThan is expected to be slow.
Steve Jessop
@Steve: What I've found is that the *big* performance problems arise in big software due to overcomplicated class/data structures with event-driven control structure created (oddly enough) because they are supposed to have "better performance". Only after I have beat the hell out of such code do I get down to where micro-optimization matters. Here's an example: http://stackoverflow.com/questions/926266/performance-optimization-strategies-of-last-resort/927773#927773
Mike Dunlavey
+1  A: 

Early optimizations are bad only if other aspects - like readability, clarity of intent, or structure - are negatively affected.

If you have to declare it anyway, loop hoisting can even improve clarity, and it explicitely documents your assumption "this value doesn't change".

As a rule of thumb I wouldn't hoist the count/end iterator for a std::vector, because it's a common scenario easily optimized. I wouldn't hoist anything that I can trust my optimizer to hoist, and I wouldn't hoist anything known to be not critical - e.g. when running through a list of dozen windows to respond to a button click. Even if it takes 50ms, it will still appear "instanteneous" to the user. (But even that is a dangerous assumption: if a new feature requires looping 20 times over this same code, it suddenly is slow). You should still hoist operations such as opening a file handle to append, etc.

In many cases - very well in loop hoisting - it helps a lot to consider relative cost: what is the cost of the hoisted calculation compared to the cost of running through the body?


As for optimizations in general, there are quite some cases where the profiler doesn't help. Code may have very different behavior depending on the call path. Library writers often don't know their call path otr frequency. Isolating a piece of code to make things comparable can already alter the behavior significantly. The profiler may tell you "Loop X is slow", but it won't tell you "Loop X is slow because call Y is thrashing the cache for everyone else". A profiler couldn't tell you "this code is fast because of your snarky CPU, but it will be slow on Steve's computer".


peterchen
Generally common-sense, and I (reluctantly) can see where some people might actually *like* the compiler to hoist code for them. Regarding "quite some cases where the profiler doesn't help", I attribute *that* to the abysmal state of profilers, based as they mostly still are on the concepts enthroned in **gprof**. I've been doing my level best to explain this, such as here: http://stackoverflow.com/questions/1777556/alternatives-to-gprof/1779343#1779343
Mike Dunlavey
... One profiler that is breaking out of the gprof mold (partly) is this one (www.rotateright.com). The reason is that it samples call stacks, on wall-clock time, and it captures statement/instruction-level percents, and has a butterfly view optionally focussed on statements/instructions.
Mike Dunlavey