tags:

views:

161

answers:

3

I recently finished fixing a bug in the following function, and the answer surprised me. I have the following function (written as it was before I found the bug):

    void Level::getItemsAt(vector<item::Item>& vect, const Point& pt)
    {
        vector<itemPtr>::iterator it; // itemPtr is a typedef for a std::tr1::shared_ptr<item::Item>
        for(it=items.begin(); it!=items.end(); ++it)
        {
            if((*it)->getPosition() == pt)
            {
                item::Item item(**it);
                items.erase(it);
                vect.push_back(item);
            }
        }
    }

This function finds all Item objects in the 'items' vector that has a certain position, removes them from 'items', and puts them in 'vect'. Later, a function named putItemsAt does the opposite, and adds items to 'items'. The first time through, getItemsAt works fine. After putItemsAt is called, though, the for loop in getItemsAt will run off the end of 'items'. 'it' will point at an invalid Item pointer, and getPosition() segfaults. On a hunch, I changed it!=items.end() to it<items.end(), and it worked. Can anyone tell me why? Looking around SO suggests it might involve erase invalidating the iterator, but it still doesn't make sense why it would work the first time through.

I'm also curious because I plan to change 'items' from a vector to a list, since list's erase is more efficient. I know I'd have to use != for a list, as it doesn't have a < operator. Would I run into the same problem using a list?

+5  A: 

You're invoking undefined behavior. All the iterators to a vector are invalidated by the fact that you called erase on that vector. It's perfectly valid for an implementation to do whatever it wants.

When you call items.erase(it);, it is now invalid. To conform to the standard, you must now assume that it is dead.

You invoke undefined behavior by using that invalid iterator in the next call to vect.push_back.

You invoke undefined behavior again by using it as the tracking variable of your for loop.

You can make your code valid by using std::remove_copy_if.

class ItemIsAtPoint : std::unary_function<bool, item::Item>
{
    Point pt;
public:
    ItemIsAtPoint(const Point& inPt) : pt(inPt) {}
    bool operator()(const item::Item* input)
    {
        return input->GetPosition() == pt;
    }
};

void Level::getItemsAt(vector<item::Item>& vect, const Point& pt)
{
    std::size_t oldSize = items.size();
    std::remove_copy_if(items.begin(), items.end(), std::back_inserter(vect), 
        ItemIsAtPoint(pt));
    items.resize(vect.size() - (items.size() - oldSize));
}

You can make this a lot prettier if you are using boost::bind, but this works.

Billy ONeal
When debugging, you should use your STL implementation's debug mode (both STLPort and GNU libstdc++ have a debug mode) which will put debug code in the iterators to raise big red flags (probably throw an exception) whenever you try to invalidly use an iterator that's been invalidated.
Ken Bloom
I guess I'm not making a connection here, but I need the function to remove elements from the vector. How does modifying an element in place help me do that?@Ken Bloom: I'm compiling with g++ and using its '-g' debug flag. Which others should I use?
Max
@Max: The code you posted does not remove elements from the vector. If you want to remove those, you should be using `std::remove_if` or `std::remove_copy_if` instead of handwritten loops.
Billy ONeal
@Billy: I meant he could make his code asymptotically faster than the code that he had already. When I look at your code, I notice that it's not doing quite the same thing that Max's code is doing. I don't have a his code for `putItemsAt`, but I think his version will end up with all of the modified items at the end of the list, and yours will end up with a modification in place.
Ken Bloom
@Max: I re-read your code -- I missed that we were talking about two separate vectors here -- give me a sec to fix my answer.
Billy ONeal
@Max: you have to use `-D_GLIBCXX_DEBUG` in order to get libstdc++'s STL debug mode.
Ken Bloom
@Ken + @Max: I have fixed my answer.
Billy ONeal
Alright. That code is a bit over my head right now, but I think I can figure it out. Thanks for your help, everyone.
Max
@Max: `class ItemIsAtPoint` is an STL functor -- a class which overloads `operator()` in order to act like a function pointer. You pass that to `std::remove_copy_if` to tell `remove_copy_if` when to actually remove and copy. The other arguments to `remove_copy_if` are merely to tell the algorithm which ranges we're using.
Billy ONeal
I don't know about passing the result of `remove_copy_if` to `erase`. The result of the `remove_copy_if` is an OutputIterator -- a `back_inserter` -- whereas the vector `erase` requires a vector `iterator`.
Daniel Trebbien
@Billy ONeal: Thanks for that explanation, it's a good bit clearer now.
Max
@Max: Give Remy the checkmark -- he deserves it. I've screwed up this answer too many times already. @Daniel: Hmm... that surprises me. I thought all the remove-like algorithms returned an iterator to the end of the new logical range -- like `std::remove` and `std::remove_if` do. Should be fixed now.
Billy ONeal
+1. He should be using remove_if and range erase (it would be more efficient typically), but it might be worth pointing out that the erase method invalidates the iterator passed in and *returns* a new, valid iterator.
@stinky: `resize` should be just as efficient as the range form of erase.
Billy ONeal
@Billy ONeal: One more thing (sorry): `vect` and `items` are backward in your code; this code is pushing back `item::Item` objects onto `items` and erasing elements from `vect`, whereas Max has it the other way around. After correcting this, I think that it is a really good idea to resize `items`. And, it would be more efficient: Remy's answer is O(i^2 + i*v*p) worst-case, where i is `items.size()` and v is `vect.size()`, and yours is O(i + i*v*p) worst-case (likely O(i*v*p) for POD types).
Daniel Trebbien
+1  A: 

I'll go with Remy Lebeau's explanation about iterator invalidation, and just add that you can make your code valid and asymptotically faster (linear time, instead of quadratic time) by using a std::list instead of a std::vector. (std::list deletions only invalidate the iterator that was deleted, and insertions don't invalidate any iterators.)

You can also predictibly identify iterator invalidation while debugging by activating your STL implementation's debug mode. On GCC, you do with with the compiler flag -D_GLIBCXX_DEBUG (see some caveats there).

Ken Bloom
+1 for telling me about those flags.
Max
+8  A: 

When you call erase(), that iterator becomes invalidated. Since that is your loop iterator, calling the '++' operator on it after invalidating it is undefined behavor. erase() returns a new valid iterator that points to the next item in the vector. You need to use that new iterator from that point onwards in your loop, ie:

void Level::getItemsAt(vector<item::Item>& vect, const Point& pt) 
{ 
    vector<itemPtr>::iterator it = items.begin();
    while( it != items.end() )
    {
        if( (*it)->getPosition() == pt )
        {
            item::Item item(**it);
            it = items.erase(it);
            vect.push_back(item);
        }
        else
            ++it;
    } 
} 
Remy Lebeau - TeamB
-1: The code you posted does nothing but replace the for loop with the while loop. It's still invalid.
Billy ONeal
@Billy: Not quite. He's corrected the erase statement. The statement "it = items.erase(it)" assigns a new, valid value to it.
Peter Ruderman
@Peter: vect.push_back doesn't invalidate it because vect is a different vector than items.
Ken Bloom
@Ken: I misread the question. I've already corrected my comment.
Peter Ruderman
@Remy and @Ken: -1 -> +1 :)
Billy ONeal
Oh, whoops, I didn't see this got edited. I'll go with this solution, since I don't have change too much of my code around. I'll keep Billy ONeal's solution in mind for the future, though.
Max
@Max: It didn't get edited. I had to edit it to make my -1 vote into a +1 vote. I still think the Algorithm is the better solution though. If nothing else, the algorithm will take linear time while this takes quadradic time.
Billy ONeal
@Max: Keep in mind that Billy's solution is not equivalent. `remove_copy_if` with a `back_inserter` and his `ItemIsAtPoint` predicate will push back `item::Item` objects onto `vect` if `getPosition() == pt`. However, the items that are copied into `vect` are not erased from `items`.
Daniel Trebbien
@Daniel: Oh wow.... I'm an idiot. That's fixed.
Billy ONeal
@Billy ONeal: Fair enough. Given that I understand that algorithm better now, and it's more efficient, I'll change my accepted answer back to yours.
Max