views:

212

answers:

4

For efficiency reasons, I always avoid writing loops like this:

for(std::size_t i = 0; i < vec.size(); ++i) { ... }

where vec is an STL container. Instead, I either do

const std::size_t vec_size = vec.size();
for(std::size_t i = 0; i < vec_size; ++i) { ... }

or use the container iterators.

But how bad is the first solution really? I remember reading in Meyers that it will be quadratic instead of linear because the vector doesn't know its size and repeatedly has to count. But won't modern compilers detect this and optimize it away?

+7  A: 

vector::size() is constant-time and usually implemented as a trivial inline function that is optimised away. Don't bother hand-optimising it.

Marcelo Cantos
I'm not gonna downvote, but I dont agree, most often its implemented as a subtraction, and in performance critical sections it might be worth to optimize.
Viktor Sehr
As a subtraction AND a division (sizeof the datatype), which makes it even worse..
Jim Buck
(assuming the division isn't simple enough to be implemented by shifts/etc.)
Jim Buck
+1: I've done bench marks and it turns out the compiler will emit **identical** code for this type of thing (because of an argument during this post: http://stackoverflow.com/questions/535223/why-cant-i-push-this-object-onto-my-stdlist/535233#535233). The standard dictates that size() is a `const` function. So as long as you only use `const` operations in the loop, it can know that it only needs to call it once. And by looking at the assembly, it does.
Evan Teran
@Evan: Thank you for sparing me the effort of arguing the point myself :-). I will add that it depends on what you do inside the loop. If it makes a non-inlined function call, there's a good chance the optimizer will give up the fight. But even then, the vast majority of types are a power of two in size, which makes the size calculation extremely cheap, and even if it isn't, a good compiler will be smart enough to multiply instead of dividing (g++ is, at least).
Marcelo Cantos
+8  A: 

I remember reading in Meyers that it will be quadratic instead of linear because the vector doesn't know its size and repeatedly has to count.

You're getting vector and list confused. vector's size value is held in the vector; list's requires transversal of the actual list.

Billy ONeal
Just out of curiosity, why can't the same principle be applied to std::list? That is, store the list's size, and increment/decrement accordingly.
Steve Guidi
@Steve: Because `std::list::splice` is required to be constant time, and has to work with arbitrary iterators. You'd have to transverse the list to get the distance there, which would make it linear time, not constant.
Billy ONeal
Thank you for the explanation!
Steve Guidi
@Steve: You're welcome. A better explanation is in Effective STL Item 4: Call `empty` instead of checking size against zero.
Billy ONeal
Edit on the above: `splice` isn't *required* to be constant time, but it *may* be. If `splice` *is* implemented to be constant time, `size` cannot be. If `size` is implemented to be constant, `splice` cannot be. Which is chosen depends on your particular implementation of the STL.
Billy ONeal
List **might** require traversal of the actual list. (And I believe size is required to be constant in C++0x).
jalf
@jalf: meaning we won't have a constant time splice ? That seems a bit of a waste given that `splice` is one of the rare reason to use a list in the first place...
Matthieu M.
@Matthieu: yep. On the other hand, it gives us a consistent guarantee about `size()`. Anyway, I could be wrong, I haven't checked the draft standard.
jalf
@Matthiew and @Jalf: If you look at the current c++0x draft, the table for all sequence containers requires `size` to be constant time. In `list`'s description, it says it supports **most** of the generalized sequence container requirements, but not all. But it also **requires** `splice` to be constant in 23.3.4.4, which means the generalized complexity requirement on sequence containers has to not be met here.
Billy ONeal
@Billy: thanks for this precision!
Matthieu M.
@Matthieu @Billy: Actually, only the `splice()` overloads that take an entire list or a single element are required to run in constant time (which makes sense, since you have the new size available to you). The `splice()` overload that takes a range from another list only has to run in linear time. Since the `list` specification does not state otherwise, I believe its `size()` must have constant time complexity, per the optional sequence requirements. I'm not 100% sure on how to parse the word "most" in §23.3.4 though :-)
James McNellis
@James: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3090.pdf page 778: `Complexity: Constant time if otherwise, linear time.`
Billy ONeal
@Billy: Right; if you are splicing within the same list, the size doesn't change; if you are splicing a range from another list, the complexity is linear because it has to count the number of elements inserted.
James McNellis
+2  A: 

The easiest way to tell if something is being optimized out by the compiler is to compare the assembly-language compiler output.

That said, the two chunks of code are not actually equivalent. What if the size of the vector changes while you're iterating over it? The compiler would have to be very, very smart to prove conclusively that the vector's size could not change.

Now, in the real world, is this tiny optimization really worth the extra effort? The vec.size() just returns a stored value. It doesn't re-count the length.

Borealid
I see this advice all the time, and it has one fatal flaw: many of us do not know assembly language, and therefore would not be able to make sense of the compiled output. If you know assembly, then great. On the other hand, if we knew assembly, we'd probably not be asking the question in the first place.
Billy ONeal
@Billy I can't imagine doing C++ programming on a platform I didn't know at least the basics of assembly language for - if you don't know it, take a few days off to learn the basics, it's well worth the effort.
anon
@Borealid: the compiler can actually know conclusively if the size **can't change**, and easily. All it needs to do is guarantee that any operations on `vec` inside the loop are `const` (and don't reference non-const variables created outside the loop since they may alias `vec`). So simple things like summation, printing, etc that are easily read only operations are trivial to determine the const-ness of the loop as a whole.
Evan Teran
@Evan Teran: And what if there are multiple threads accessing the structure, or it is resides in a shared memory region? It's not that simple.
Borealid
@Borealid: c++03 is entirely unaware of threads and shared memory. And compilers for the most part ignore there existence with a few exceptions. As far as optimizations go, it is entirely "ok" for a compiler to optimize as if those are not an issue, and except for a few select circumstances that's what they do.
Evan Teran
@Boreliad: if the size can change during the loop, not re-reading size is the least of your problems. And as Evan said, compilers are supposed to ignore that possibility unless instructed otherwise.
peterchen
+1  A: 

Consider the following stupid function:

void sum (vector<int>& vec, int* sumOut)
{
    *sumOut = 0;
    for(std::size_t i = 0; i < vec.size(); ++i)
    {
        *sumOut += vec[i];      
    }
}

The actual assembly generated will depend on the compiler and implementation of vector, but I think in most cases, the compiler has to re-read the vector's size from memory each time through the loop. This is because the sumOut pointer could potentially overlap (alias) the vector's internal storage of the size (assuming the vector stores its size in an int), so the size could be changed by the loop. If you call a function like this a lot, it could add up to a lot of cycles because you're touching memory more than you need.

Three possible solutions are:

  1. Store the size in a local variable. Ideally, the size this will get stored in a register and avoid touching memory altogether. Even if it has to get put on the stack, the compiler should be able to order the loads/stores more efficiently.

  2. Use __restrict on the output pointer. This tells the compiler that the pointer can't possibly overlap anything else, so writes to it don't require reloading anything else.

  3. Reverse the loop. The termination condition now checks against 0 instead, so vec.size() is never called again.

Of those, I think #1 is the cleanest, but some people might prefer #3. #2 is the probably least reader-friendly, but might be faster than the others (because it means the vector's data could be read more efficiently).

For more info on aliasing, see Christer Ericson's GDC presentation on memory optimization; there's an example almost identical to this in there.

celion
6502