Using the latest gcc compiler, do I still have to think about these types of manual loop optimizations, or will the compiler take care of them for me well enough?
Write the code, profile it, and only think about optimising it when you have found something that is not fast enough, and you can't think of an alternative algorithm that will reduce/avoid the bottleneck in the first place.
With modern compilers, this advice is even more important - if you write simple clean code, the compiler's optimiser can often do a better job of optimising the code than it can if you try to give it snazzy "pre-optimised" code.
Remember 80-20 Rule.(80% of execution time is spent on 20% critical code in the program) There is no meaning in optimizing the code which have no significant effect on program's overall efficiency.
One should not bother about such kind of local optimization in the code.So the best approach is to profile the code to figure out the critical parts in the program which consumes heavy CPU cycles and try to optimize it.This kind of optimization will really makes some sense and will result in improved program efficiency.
Compilers generally do an excellent job with this type of optimization, but they do miss some cases. Generally, my advice is: write your code to be as readable as possible (which may mean that you hoist loop invariants -- I prefer to read code written that way), and if the compiler misses optimizations, file bugs to help fix the compiler. Only put the optimization into your source if you have a hard performance requirement that can't wait on a compiler fix, or the compiler writers tell you that they're not going to be able to address the issue.
If your profiler tells you there is a problem with a loop, and only then, a thing to watch out for is a memory reference in the loop which you know is invariant across the loop but the compiler does not. Here's a contrived example, bubbling an element out to the end of an array:
for ( ; i < a->length - 1; i++)
swap_elements(a, i, i+1);
You may know that the call to swap_elements
does not change the value of a->length
, but if the definition of swap_elements
is in another source file, it is quite likely that the compiler does not. Hence it can be worthwhile hoisting the computation of a->length
out of the loop:
int n = a->length;
for ( ; i < n - 1; i++)
swap_elements(a, i, i+1);
On performance-critical inner loops, my students get measurable speedups with transformations like this one.
Note that there's no need to hoist the computation of n-1
; any optimizing compiler is perfectly capable of discovering loop-invariant computations among local variables. It's memory references and function calls that may be more difficult. And the code with n-1
is more manifestly correct.
As others have noted, you have no business doing any of this until you've profiled and have discovered that the loop is a performance bottleneck that actually matters.
Check the generated assembly and see for yourself. See if the computation for the loop-invariant code is being done inside the loop or outside the loop in the assembly code that your compiler generates. If it's failing to do the loop hoisting, do the hoisting yourself.
But as others have said, you should always profile first to find your bottlenecks. Once you've determined that this is in fact a bottleneck, only then should you check to see if the compiler's performing loop hoisting (aka loop-invariant code motion) in the hot spots. If it's not, help it out.
Where they are likely to be important to performance, you still have to think about them.
Loop hoisting is most beneficial when the value being hoisted takes a lot of work to calculate. If it takes a lot of work to calculate, it's probably a call out of line. If it's a call out of line, the latest version of gcc is much less likely than you are to figure out that it will return the same value every time.
Sometimes people tell you to profile first. They don't really mean it, they just think that if you're smart enough to figure out when it's worth worrying about performance, then you're smart enough to ignore their rule of thumb. Obviously, the following code might as well be "prematurely optimized", whether you have profiled or not:
#include <iostream>
bool isPrime(int p) {
for (int i = 2; i*i <= p; ++i) {
if ((p % i) == 0) return false;
}
return true;
}
int countPrimesLessThan(int max) {
int count = 0;
for (int i = 2; i < max; ++i) {
if (isPrime(i)) ++count;
}
return count;
}
int main() {
for (int i = 0; i < 10; ++i) {
std::cout << "The number of primes less than 1 million is: ";
std::cout << countPrimesLessThan(1000*1000);
std::cout << std::endl;
}
}
It takes a "special" approach to software development not to manually hoist that call to countPrimesLessThan out of the loop, whether you've profiled or not.
A good rule of thumb is usually that the compiler performs the optimizations it is able to. Does the optimization require any knowledge about your code that isn't immediately obvious to the compiler? Then it is hard for the compiler to apply the optimization automatically, and you may want to do it yourself
In most cases, lop hoisting is a fully automatic process requiring no high-level knowledge of the code -- just a lot of lifetime and dependency analysis, which is what the compiler excels at in the first place.
It is possible to write code where the compiler is unable to determine whether something can be hoisted out safely though -- and in those cases, you may want to do it yourself, as it is a very efficient optimization.
As an example, take the snippet posted by Steve Jessop:
for (int i = 0; i < 10; ++i) {
std::cout << "The number of primes less than 1 billion is: ";
std::cout << countPrimesLessThan(1000*1000*1000);
std::cout << std::endl;
}
Is it safe to hoist out the call to countPrimesLessThan
? That depends on how and where the function is defined. What if it has side effects? It may make an important difference whether it is called once or ten times, as well as when it is called. If we don't know how the function is defined, we can't move it outside the loop. And the same is true if the compiler is to perform the optimization.
Is the function definition visible to the compiler? And is the function short enough that we can trust the compiler to inline it, or at least analyze the function for side effects? If so, then yes, it will hoist it outside the loop.
If the definition is not visible, or if the function is very big and complicated, then the compiler will probably assume that the function call can not be moved safely, and then it won't automatically hoist it out.
Early optimizations are bad only if other aspects - like readability, clarity of intent, or structure - are negatively affected.
If you have to declare it anyway, loop hoisting can even improve clarity, and it explicitely documents your assumption "this value doesn't change".
As a rule of thumb I wouldn't hoist the count/end iterator for a std::vector, because it's a common scenario easily optimized. I wouldn't hoist anything that I can trust my optimizer to hoist, and I wouldn't hoist anything known to be not critical - e.g. when running through a list of dozen windows to respond to a button click. Even if it takes 50ms, it will still appear "instanteneous" to the user. (But even that is a dangerous assumption: if a new feature requires looping 20 times over this same code, it suddenly is slow). You should still hoist operations such as opening a file handle to append, etc.
In many cases - very well in loop hoisting - it helps a lot to consider relative cost: what is the cost of the hoisted calculation compared to the cost of running through the body?
As for optimizations in general, there are quite some cases where the profiler doesn't help. Code may have very different behavior depending on the call path. Library writers often don't know their call path otr frequency. Isolating a piece of code to make things comparable can already alter the behavior significantly. The profiler may tell you "Loop X is slow", but it won't tell you "Loop X is slow because call Y is thrashing the cache for everyone else". A profiler couldn't tell you "this code is fast because of your snarky CPU, but it will be slow on Steve's computer".