Loop unrolling will not magically make the code executed in the loop run faster. All it does is to save a few CPU cycles used for comparing the loop variable. So it only makes sense in very tight loops where the loop body itself does next to nothing.
Regarding your example: While push_back()
takes amortized constant time, this does include the occasional allocate-copy-deallocate cycle plus the copying of the actual objects. I very much doubt that the comparisons in the loop play a significant role compared to that. And if you replace it with anything else taking a long time, the same applies.
Of course, this could be wrong on any specific CPU and right on any other. With the idiosyncrasies of modern CPU architectures with their caches, instruction pipelines and branch prediction schemes it has become very hard to outsmart the compiler in optimizing code. That you would attempt to optimize a loop with a "heavy" body by unrolling it seems to be a hint that you don't know enough to achieve much in this. (I'm trying hard to say this so you won't be offended. I'm the first to admit that I'm a looser in this game myself.)
If you're having problems with performance, IME in 9 out of 10 cases eliminating silly errors (like copying complex objects) and optimizing algorithms and data structures is what you should look at.
(If you still believe your problem falls into the 1-out-of-10 category, then try Intel's compiler. The last time I looked at it you could download a test version for free, it plugged into VS, was very easy to setup, and brought about 0.5% of speed gain in the application I tested it in.)