tags:

views:

169

answers:

3

I've enabled iterator debugging in an application by defining

_HAS_ITERATOR_DEBUGGING = 1

I was expecting this to really just check vector bounds, but I have a feeling it's doing a lot more than that. What checks, etc are actually being performed?

Dinkumware STL, by the way.

A: 

As far as I understand:

_HAS_ITERATOR_DEBUGGING will display a dialog box at run time to assert any incorrect iterator use including:

1) Iterators used in a container after an element is erased

2) Iterators used in vectors after a .push() or .insert() function is called

Ami
A: 

According to http://msdn.microsoft.com/en-us/library/aa985982%28v=VS.80%29.aspx

The C++ standard describes which member functions cause iterators to a container to become invalid. Two examples are:

  • Erasing an element from a container causes iterators to the element to become invalid.
  • Increasing the size of a vector (push or insert) causes iterators into the vector container become invalid.
mathmike
+4  A: 

There is a number of operations with iterators which lead to undefined behavior, the goal of this trigger is to activate runtime checks to prevent it from occuring (using asserts).

The issue

The obvious operation is to use an invalid iterator, but this invalidity may arise from various reasons:

  • Unitialized iterator
  • Iterator to an element that has been erased
  • Iterator to an element which physical location has changed (reallocation for a vector)
  • Iterator outside of [begin, end)

The standard precise in excruciating details for each container which operation invalidates which iterator.

There is a somehow less obvious reason that people tend to forget: mixing iterators to different containers:

std::vector<Animal> cats, dogs;

for_each(cats.begin(), dogs.end(), /**/); // obvious bug

This pertain to a more general issue: the validity of ranges passed to the algorithms.

  • [cats.begin(), dogs.end()) is invalid (unless one is an alias for the other)
  • [cats.end(), cats.begin()) is invalid (unless cats is empty ??)

The solution

The solution consists in adding information to the iterators so that their validity and the validity of the ranges they defined can be asserted during execution thus preventing undefined behavior to occur.

The _HAS_ITERATOR_DEBUGGING symbol serves as a trigger to this capability, because it unfortunately slows down the program. It's quite simple in theory: each iterator is made an Observer of the container it's issued from and is thus notified of the modification.

In Dinkumware this is achieved by two additions:

  • Each iterator carries a pointer to its related container
  • Each container holds a linked list of the iterators it created

And this neatly solves our problems:

  • An unitialized iterator does not have a parent container, most operations (apart from assignment and destruction) will trigger an assertion
  • An iterator to an erased or moved element has been notified (thanks to the list) and know of its invalidity
  • On incrementing and decrementing an iterator it can checks it stays within the bounds
  • Checking that 2 iterators belong to the same container is as simple as comparing their parent pointers
  • Checking the validity of a range is as simple as checking that we reach the end of the range before we reach the end of the container (linear operation for those containers which are not randomly accessible, thus most of them)

The cost

The cost is heavy, but does correctness has a price ? We can break down the cost though:

  • extra memory allocation (the extra list of iterators maintained): O(NbIterators)
  • notification process on mutating operations: O(NbIterators) (Note that push_back or insert do not necessarily invalidates iterators, but erase does)
  • range validity check: O( min(last-first, container.end()-first) )

Most of the library algorithms have of course been implemented for maximum efficiency, typically the check is done once and for all at the beginning of the algorithm, then an unchecked version is run. Yet the speed might severely slow down, especially with hand-written loops:

for (iterator_t it = vec.begin();
     it != vec.end();              // Oups
     ++it)
// body

We know the Oups line is bad taste, but here it's even worse: at each run of the loop, we create a new iterator then destroy it which means allocating and deallocating a node for vec's list of iterators... Do I have to underline the cost of allocating/deallocating memory in a tight loop ?

Of course, a for_each would not encounter such an issue, which is yet another compelling case toward the use of STL algorithms instead of hand-coded versions.

Matthieu M.