views:

356

answers:

5

I found that this C++ code:

vector<int> a;
a.push_back(1);
a.push_back(2);
vector<int>::iterator it = a.begin();
a.push_back(4);
cout << *it;

print some big random number; but if you add a.push_back(3) between 3rd and 4th lines, it will print 1. Can you explain it to me?

+6  A: 

Vector iterators are only invalidated when the vector performs a reallocation.

The call to push_back(4) is causing the vector to allocate a new block of memory - this is what causes your iterator to become invalid. When you also use push_back(3), no reallocation is performed for push_back(4) so the iterator remains valid.

Paul Baker
+12  A: 

Edited with more careful wording

yes, resizing a vector might invalidate all iterators pointing into the vector.

The vector is implemented by internally allocating an array where the data is stored. When the vector grows, that array might run out of space, and when it does, the vector allocates a new, bigger, array, copies the data over to that, and then deletes the old array.

So your old iterators, which point into the old memory, are no longer valid. If the vector is resized downwards (for example by pop_back()), however, the same array is used. The array is never downsized automatically.

One way to avoid this reallocation (and pointer invalidation) is to call vector::reserve() first, to set aside enough space that this copying isn't necessary. In your case, if you called a.reserve(3) before the first push_back() operation, then the internal array would be big enough that the push_back's can be performed without having to reallocate the array, and so your iterators will stay valid.

jalf
You're first sentence doesn't match with your last paragraph. If you resize a vector to a size that is less than the capacity reserved by a previous reserve call then valid iterators before the resize are guaranteed to stay valid. So: "Resizing a vector may invalidate all iterators pointing into the vector."
Charles Bailey
The situation is like this: invalidation occurs *if* the new addition outgrows the reserved space *and* the new low-level allocation lies in a different part of memory (because low level allocators are allowed to attempt to grow the block in place). But by design `std::vector()` prevents you from finding out if these conditions apply. So the only way to be sure the iterators stay valid after a `push_back()` is to manually reserve sufficient space ahead of time.
dmckee
Actually, you can check the 'capacity' on most implementations, I don't know if it's required by the standard though.
Matthieu M.
True, "resize" probably wasn't the best choice of words. Resizing downwards won't cause any problems, and resizing upwards *might* be ok (if by 'resize' we mean the `resize()` function, and we'd previously called `reserve()`.
jalf
I thought that Matthieu M. was right, but now I'm not so sure. The standard says that `insert` (and hence, via reference, `push_back`) causes reallocation if the new size is greater than the capacity. It then goes on to say that if no reallocation occurs, iterators before the insertion point (all iterators to elements for `push_back`) remain valid. Unfortunately the standard doesn't appear to say anything about the converse, i.e. it doesn't say "If the new size is no greater than the capacity then reallocation does not occur" unless this is implied somewhere else.
Charles Bailey
Update: There is a note under the notes for `reserve` that guarantees that after a call to `reserve` no insertion will cause a reallocation unless that insertion would cause the size of the vector to exceed the size passed to the `reserve` call. This still isn't a guarantee that an insertion that keeps the size less than the capacity won't cause a reallocation. That's subtly different.
Charles Bailey
@Charles: `capacity()` is *defined* to return "the total number of elements that the vector can hold without requiring reallocation". I don't believe the committee intended "require" in this context to have a meaning such that vectors may reallocate when not required to do so. So a vector can only reallocate if the new size exceeds the capacity, which is your second (stronger) guarantee.
Steve Jessop
+2  A: 

Yes, any action that might change the size of the vector can invalidate iterators.

Edit: That includes operations (e.g. erase(), resize()) that reduce the size of the container. erase() doesn't invalidate all iterators, but it does invalidate any iterators referring to any point after the erased element(s). resize() is defined in terms of insert() and erase(), so it has the same potential.

Jerry Coffin
Reducing the size shouldn't.
David Thornley
A: 

The rules for iterator invalidation are specific to a container.

Now invalidation may have 2 meanings with a vector:

  1. Invalidation = point outside of the range defined by [begin,end]
  2. Invalidation = point to a different object from the original one

As you can see, the second is much more strict:

std::vector<int> myVector;
myVector.push_back(0);
myVector.push_back(1);

std::vector<int>::iterator it = myVector.begin(); // it points to 0
myVector.erase(it); // it points to 1
myVector.erase(it); // it == myVector.end()

In this case, it is 'valid' in that it is always in the inclusive range [begin,end] and so can be safely used for any operation on myVector. On the other hand the expression (*it) keeps changing: first it returns 0, then 1, then it has undefined behavior...

In general, people will rather speak about the 2nd requirement, and invalidating an iterator simply means that (*it) may not produce the same result as before.


Now that this is said, there are several ways to invalidate an iterator on a Vector (in fact, it's the less stable structure of the STL).

During additions of elements:

  • This may trigger a reallocation (1) if myVector.size() == myVector.capacity(), since checking this is error-prone, we usually consider that any addition will invalidate the iterators
  • If you want to be 'picky' and knows that the reallocation is not triggered, then you still have to worry about insert. Inserting an element invalidates the iterators pointing to this current position and all subsequent ones since the elements are shifted one step towards the end of the vector.

During removal of elements:

  • There is no reallocation, even if the buffer is now much bigger than needed. It is possible to force this though, using the shrink to fit idiom (2).
  • All iterators pointing past the element removed are invalidated. Especially, the previous 'end' iterator is now beyond the [begin,end] range and cannot be used safely within STL algorithms for example.

(1) The internal structure of a std::vector is an array of T, this is due for compatibility with the C-programs (using &myVector.front() as the address of the array) and because it guarantees contiguity and a minimum space overhead (ie, the amount of space taken by the vector own data vs the amount of space occupied by an object)

At any moment, you can know how many objects a vector can hold using .capacity() method.

When you want to insert an object and the vector does not have the necessary capacity, a call to the .reserve(size_t) method is triggered. This method, if the number of items required is superior to the current capacity, triggers a reallocation.

The vector then allocates a new array of elements (its size is generally 2*n+1 where n is the current capacity), copies the elements of the current array to the new array, discards the current array.

Because it discards the current array, your iterators are invalidated as vector iterators are generally simple pointers (for efficiency).

Note that if the iterators were implemented as: a reference to the vector + a count, and dereferencing was actually *(&m_vector.front() + n) reallocation would not invalidate them... but they would be less efficient.

(2) Shrink to fit: Warning, this triggers a COPY of the elements and invalidates iterators.

// myVector has 10 elements, but myVector.capacity() == 1000
myVector.swap(std::vector<int>(myVector));

It first creates a temporary vector, which will allocate only as much memory as is needed (with a minimum depending on the library), and copy the elements of myVector. Then the swap operation exchange the buffers from myVector and this copy, and thus myVector now hols a buffer with the minimum amount of memory necessary. At the end of the operation the temporary is destroyed and the memory it holds released.

Matthieu M.
A: 

For future reference, all of the STL sorts of tidbits like this are on SGI's website: http://www.sgi.com/tech/stl/Vector.html

If you need iterators to stay valid after you add or delete to a collection, take a look at another kind of collection, like a list.

The best thing to do though is to identify out of the matrix of features you want from a collection (random access, etc.) then pick the right container.

See the wikipedia article on Standard_Template_Library Containers for a starting point. If you have the cash, I highly recommend Scott Meyer's "Effective STL: 50 Specific Ways to Improve Your Use of the Standard Template Library".

Apologies for lack of supporting links, I'm a newbie here and lack the reputation to post this with more than one.

Tim Finer