Consider the following stupid function:
void sum (vector<int>& vec, int* sumOut)
{
*sumOut = 0;
for(std::size_t i = 0; i < vec.size(); ++i)
{
*sumOut += vec[i];
}
}
The actual assembly generated will depend on the compiler and implementation of vector
, but I think in most cases, the compiler has to re-read the vector
's size from memory each time through the loop. This is because the sumOut
pointer could potentially overlap (alias) the vector's internal storage of the size (assuming the vector
stores its size in an int), so the size could be changed by the loop. If you call a function like this a lot, it could add up to a lot of cycles because you're touching memory more than you need.
Three possible solutions are:
Store the size in a local variable.
Ideally, the size this will get
stored in a register and avoid touching
memory altogether. Even if it has to
get put on the stack, the compiler
should be able to order the
loads/stores more efficiently.
Use __restrict
on the output
pointer. This tells the compiler
that the pointer can't possibly
overlap anything else, so writes to
it don't require reloading anything
else.
Reverse the loop. The termination
condition now checks against 0
instead, so vec.size()
is never
called again.
Of those, I think #1 is the cleanest, but some people might prefer #3. #2 is the probably least reader-friendly, but might be faster than the others (because it means the vector's data could be read more efficiently).
For more info on aliasing, see Christer Ericson's GDC presentation on memory optimization; there's an example almost identical to this in there.