There is a number of operations with iterators which lead to undefined behavior, the goal of this trigger is to activate runtime checks to prevent it from occuring (using asserts).
The issue
The obvious operation is to use an invalid iterator, but this invalidity may arise from various reasons:
- Unitialized iterator
- Iterator to an element that has been erased
- Iterator to an element which physical location has changed (reallocation for a
vector
)
- Iterator outside of
[begin, end)
The standard precise in excruciating details for each container which operation invalidates which iterator.
There is a somehow less obvious reason that people tend to forget: mixing iterators to different containers:
std::vector<Animal> cats, dogs;
for_each(cats.begin(), dogs.end(), /**/); // obvious bug
This pertain to a more general issue: the validity of ranges passed to the algorithms.
[cats.begin(), dogs.end())
is invalid (unless one is an alias for the other)
[cats.end(), cats.begin())
is invalid (unless cats
is empty ??)
The solution
The solution consists in adding information to the iterators so that their validity and the validity of the ranges they defined can be asserted during execution thus preventing undefined behavior to occur.
The _HAS_ITERATOR_DEBUGGING
symbol serves as a trigger to this capability, because it unfortunately slows down the program. It's quite simple in theory: each iterator is made an Observer
of the container it's issued from and is thus notified of the modification.
In Dinkumware this is achieved by two additions:
- Each iterator carries a pointer to its related container
- Each container holds a linked list of the iterators it created
And this neatly solves our problems:
- An unitialized iterator does not have a parent container, most operations (apart from assignment and destruction) will trigger an assertion
- An iterator to an erased or moved element has been notified (thanks to the list) and know of its invalidity
- On incrementing and decrementing an iterator it can checks it stays within the bounds
- Checking that 2 iterators belong to the same container is as simple as comparing their parent pointers
- Checking the validity of a range is as simple as checking that we reach the end of the range before we reach the end of the container (linear operation for those containers which are not randomly accessible, thus most of them)
The cost
The cost is heavy, but does correctness has a price ? We can break down the cost though:
- extra memory allocation (the extra list of iterators maintained):
O(NbIterators)
- notification process on mutating operations:
O(NbIterators)
(Note that push_back
or insert
do not necessarily invalidates iterators, but erase
does)
- range validity check:
O( min(last-first, container.end()-first) )
Most of the library algorithms have of course been implemented for maximum efficiency, typically the check is done once and for all at the beginning of the algorithm, then an unchecked version is run. Yet the speed might severely slow down, especially with hand-written loops:
for (iterator_t it = vec.begin();
it != vec.end(); // Oups
++it)
// body
We know the Oups line is bad taste, but here it's even worse: at each run of the loop, we create a new iterator then destroy it which means allocating and deallocating a node for vec
's list of iterators... Do I have to underline the cost of allocating/deallocating memory in a tight loop ?
Of course, a for_each
would not encounter such an issue, which is yet another compelling case toward the use of STL algorithms instead of hand-coded versions.