views:

546

answers:

5

How do you tell the compiler to unroll loops based on the number of iterations or some other attribute? Or, how do you turn on loop unrolling optimization in Visual Studio 2005?

EDIT: E.g.

//Code Snippet 1
    vector<int> b;
    for(int i=0;i<3;++i) b.push_back(i);

As opposed to

//Code Snippet 2
    vector<int> b;
    b.push_back(0);
    b.push_back(1);
    b.push_back(2);

push_back() is an example, I could replace this with anything which can take a long time.

But I read somewhere that I can use Code 1 and the compiler can unroll it to Code 2 if the loop satisfies some criteria. So my question is: how do you do that? There's already a discussion on SO as to which one is more efficient but any comments on that is appreciated anyway.

+4  A: 

Usually you just let the compiler to its job. If the number of loops is known at compile-time, and compiler optimizations are turned on, the compiler will balance code-size with branch reduction and unroll any unrollable loops.

If that's really not what you want, there's also the possibility of doing it yourself with Duff's Device: (from wikipedia)

send(to, from, count)
register short *to, *from;
register count;
{
    register n=(count+7)/8;
    switch(count%8){
    case 0: do{ *to = *from++;
    case 7:  *to = *from++;
    case 6:  *to = *from++;
    case 5:  *to = *from++;
    case 4:  *to = *from++;
    case 3:  *to = *from++;
    case 2:  *to = *from++;
    case 1:  *to = *from++;
     }while(--n>0);
    }
}

This gives you unrolling with runtime determined iteration counts.

If it's still compile-time unrolling you want, and the built in optimizations aren't what you want (if you want finer-grained control), you could create a C++ template to do what you want. This is a pretty trivial template application, and since it is all done at compile time, you don't lose any function inlining or other optimizations that the compiler might do in addition.

Eclipse
sbi
You're really going to have a hard time outsmarting the compiler these days - and so far in 5 years of back-end server work I've never seen a need for Duff's Device. We have had a case where the template solution was needed - by tuning to the cache sizes of the target servers, we managed to get a 10x speedup in one particularly bottlenecking operation.
Eclipse
Why is this downvoted? Seems good to me. It says "don't bother", and explains what to do in the rare cases where you want to manually ensure unrolling.
jalf
I think it's harsh to mark this down, the answer says you should normally let the compiler do its job.
Tom
My OP was regarding those compiler optimizations in VS2005.
Jacob
If you're wanting to know how to enable optimizations, generally the easiest way is to compile in release mode.
Eclipse
+5  A: 

It's generally fairly simple: "You enable optimizations".

If you tell the compiler to optimize your code, then loop unrolling is one of the many optimizations it tries to apply.

Keep in mind though, that unrolling is not always going to produce faster code. It might cause cache misses (in both data and instruction cache). And with the advanced branch prediction found in modern CPU's, the costs of the branches that make up a loop is often negligible.

Sometimes, the compiler may determine that unrolling would produce slower code, and then it won't do it.

jalf
Right, but how? A VS2005 specific answer would be beneficial
Jacob
Unless you work on the compiler optimization team at Microsoft, it won't be possible to answer the question specifically. The documentation is very vague. Unless you just want to know how to turn on optimizations? (RTFM)
Mark Ransom
If turning on optimizations is going to achieve this, then that's not a problem. But I read somewhere that the user could specify how many iterations is required before the loops were unrolled and was wondering if anyone here knew that. But if the compiler estimates the time etc. as jalf has stated, then I guess turning on optimization is good enough.
Jacob
Visual studio gives you some control over the depth of inlining (#pragma inline_depth), but not over loop unrolling.
Eclipse
The compiler does try to estimate it. Keep in mind though, that it has to be a compile-time decision. It can't let the loop run for X iterations and *then* decide to unroll. So if the number of iterations is a compile-time constant, the compiler will be much better able to determine when and how to unroll. If the number of iterations is determined at runtime, the compiler's options are much more limited - and you *may* then have to manually unroll the loop. (After profiling has shown performance to be a problem, of course)
jalf
+4  A: 

Loop unrolling will not magically make the code executed in the loop run faster. All it does is to save a few CPU cycles used for comparing the loop variable. So it only makes sense in very tight loops where the loop body itself does next to nothing.

Regarding your example: While push_back() takes amortized constant time, this does include the occasional allocate-copy-deallocate cycle plus the copying of the actual objects. I very much doubt that the comparisons in the loop play a significant role compared to that. And if you replace it with anything else taking a long time, the same applies.

Of course, this could be wrong on any specific CPU and right on any other. With the idiosyncrasies of modern CPU architectures with their caches, instruction pipelines and branch prediction schemes it has become very hard to outsmart the compiler in optimizing code. That you would attempt to optimize a loop with a "heavy" body by unrolling it seems to be a hint that you don't know enough to achieve much in this. (I'm trying hard to say this so you won't be offended. I'm the first to admit that I'm a looser in this game myself.)

If you're having problems with performance, IME in 9 out of 10 cases eliminating silly errors (like copying complex objects) and optimizing algorithms and data structures is what you should look at.

(If you still believe your problem falls into the 1-out-of-10 category, then try Intel's compiler. The last time I looked at it you could download a test version for free, it plugged into VS, was very easy to setup, and brought about 0.5% of speed gain in the application I tested it in.)

sbi
+5  A: 

Note that you say:

push_back() is an example, I could replace this with anything which can take a long time.

In fact, if push_back() (or whatever you replace it with) takes a long time, that's a situation where loop unrolling would be a waste of effort. Looping generally isn't particularly slow; the times where loop unrolling makes sense is where the work done inside the loop is very small - in that case the looping constructs might start to dominate the processing of that stretch of execution.

As I'm sure you'll get in many other answers - don't worry about this type of thing unless you actually find that it's a bottleneck. 99% of the time, it won't be.

Michael Burr
+3  A: 
Tom
Thanks for the tip about VS!
Jacob