tags:

views:

192

answers:

5

I need to modify an object that has already been inserted into a set. This isn't trivial because the iterator in the pair returned from an insertion of a single object is a const iterator and does not allow modifications. So, my plan was that if an insert failed I could copy that object into a temporary variable, erase it from the set, modify it locally and then insert my modified version.

insertResult = mySet.insert(newPep);
    if( insertResult.second == false )
        modifySet(insertResult.first, newPep);

void modifySet(set<Peptide>::iterator someIter, Peptide::Peptide newPep) {
    Peptide tempPep = (*someIter);
    someSet.erase(someIter);
    // Modify tempPep - this does not modify the key
    someSet.insert(tempPep);            

}

This works, but I want to make my insert more efficient. I tried making another iterator and setting it equal to someIter in modifySet. Then after deleting someIter I would still have an iterator to that location in the set and I could use that as the insertion location.

void modifySet(set<Peptide>::iterator someIter, Peptide::Peptide newPep) {
    Peptide tempPep = (*someIter);
    anotherIter = someIter;
    someSet.erase(someIter);
    // Modify tempPep - this does not modify the key
    someSet.insert(anotherIter, tempPep);            

}

However, this results in a seg fault. I am hoping that someone can tell me why this insertion fails or suggest another way to modify an object that has already been inserted into a set.

The full source code can be viewed at github.

+2  A: 

std::set::insert returns a pair<iterator, bool> as far as I know. In any case, directly modifying an element in any sort of set is risky. What if your modification causes the item to compare equal to another existing item? What if it changes the item's position in the total order of items in the set? Depending on the implementation, this will cause undefined behaviour.

If the item's key remains the same and only its properties change, then I think what you really want is a map or an unordered_map instead of a set.

Peter Ruderman
The modification does not change the items key, so I don't need to worry about undefined behavior, but that is a good point. I'll check out maps, thanks for the tip.
lashleigh
If you read carefully, you'll notice that she's removing and then reinserting the colliding item to avoid the problems you described.
Benson
@Benson: Yes, I did notice that, but the question is can it be done directly? The answer is yes, use a map.
Peter Ruderman
How about the other question: can it be done with minimal modification to the existing code? (i.e. can it be done with a set while maintaining an iterator for cheap inserts?)
Benson
I think switching to a map is a fairly minor change. The only annoying point is that map iterators point to key/value pairs instead of just values. So calls like iterator->Foo() become iterator->second.Foo().
Peter Ruderman
+2  A: 

I agree with Peter that a map is probably a better model of what you are doing, specifically something like map<pep_key, Peptide::Peptide>, would let you do something like:

insertResult = myMap.insert(std::make_pair(newPep.keyField(), newPep));
if( insertResult.second == false )
    insertResult.first->second = newPep;  

To answer your question, the insert segfaults because erase invalidates an iterator, so inserting with it (or a copy of it) is analogous to dereferencing an invalid pointer. The only way I see to do what you want is with a const_cast

insertResult = mySet.insert(newPep);
if( insertResult.second == false )
    const_cast<Peptide::Peptide&>(*(insertResult.first)) = newPep;

the const_cast approach looks like it will work for what you are doing, but is generally a bad idea.

academicRobot
Ah, of course, that should have been obvious. What if I were to initialize another iterator with --someIter, pointing to the preceding element? I do plan to look at maps, since most people agree that would be better practice, but I'm curious. Also, thank you for the code examples.
lashleigh
@lash - yes, if you pass insert an iterator to the position preceding the insert position, insert will be very efficient (good doc [here](http://www.cplusplus.com/reference/stl/set/insert/)). Set iterators are not invalidated by changes to the set (unlike vector iterators, for example), so the approach you describe would work well. This approach is completely safe (checking `it != container.begin()` before `--it`), unlike using const_cast.
academicRobot
I tried this out and it didn't work. Because by saying --it, the iterator was then pointing at the wrong element in the set and erasing the wrong element. So, then I thought well I can just do it++ to erase the correct element again, but that also failed, and it was just too messy.
lashleigh
@lash To be clear, the procedure would be `tempIt = it; --tempIt; mySet.erase(it); mySet.insert(tempIt, ...);`.
academicRobot
@academicRobot That did the trick, I had an unrelated bug that made it seem like it wasn't working at first. Thanks!
lashleigh
+2  A: 

I hope it isn't bad form to answer my own question, but I would like it to be here in case someone else ever has this problem. The answer of why my attempt seg faulted was given my academicRobot, but here is the solution to make this work with a set. While I do appreciate the other answers and plan to learn about maps, this question was about efficiently re-inserting into a set.

void modifySet(set<Peptide>::iterator someIter, Peptide::Peptide newPep) {
    if( someIter == someSet.begin() ) {
        Peptide tempPep = (*someIter);
        someSet.erase(someIter);
        // Modify tempPep - this does not modify the key
        someSet.insert(tempPep);   
    }
    else {
        Peptide tempPep = (*someIter);
        anotherIter = someIter;
        --anotherIter;
        someSet.erase(someIter);
        // Modify tempPep - this does not modify the key
        someSet.insert(anotherIter, tempPep); 
     }
}

In my program this change dropped my run time by about 15%, from 32 seconds down to 27 seconds. My larger data set is currently running and I have my fingers crossed that the 15% improvement scales.

lashleigh
+1. Not bad form at all.
academicRobot
Indeed - not bad form at all. And for precisely the reason you mentioned - if you've got a solution, share it! It will help others if they ever have the same issue.
Mac
I suppose you would insert `newPep` instead ? Otherwise it is not used.
Matthieu M.
What's the chance of taking the first branch? Is this dependent on the size of your input? If it's random, the savings will approach 0% as the set grows, but the cost of the branch will remain fixed. You can effectively merge the two by using `someIter++` as the insertion location; `set.end()` is a valid hint.
MSalters
@MSalters The standard only guarantees O(1) for insertions after the hint (23.1.2), but gcc>4.2 implements before and after, according to [this proposal](http://gcc.gnu.org/onlinedocs/libstdc++/manual/bk01pt07ch17.html), so your idea works for gcc. Do you know if this is true for other common compilers?
academicRobot
As far as I know it's common. It's fairly easy to use a hint when it's not perfect. Also, there are N+1 possible insertion points, and N+1 iterators into a container with N elements. Hence it actually makes more sense to interpret the hint that way. This is consistent with non-ordered containers, where `insert(newElement, iter)` inserts `newElement` before `*iter`
MSalters
+1  A: 

As you realized set are a bit messy to deal with because you have no way to indicate which part of the object should be considered for the key and which part you can modify safely.

The usual answer is to use a map or an unordered_map (if you have access to C++0x) and cut your object in two halves: the key and the satellite data.

Beware of the typical answer: std::map<key_type, Peptide>, while it seems easy it means you need to guarantee that the key part of the Peptide object always match the key it's associated with, the compiler won't help.

So you have 2 alternatives:

  1. Cut Peptide in two: Peptide::Key and Peptide::Data, then you can use the map safely.
  2. Don't provide any method to alter the part of Peptide which defines the key, then you can use the typical answer.

Finally, note that there are two ways to insert in a map-like object.

  • insert: insert but fails if the value already exists
  • operator[]: insert or update (which requires creating an empty object)

So, a solution would be:

class Peptide
{
public:
  Peptide(int const id): mId(id) {}

  int GetId() const;

  void setWeight(float w);
  void setLength(float l);

private:
  int const mId;
  float mWeight;
  float mLength;
};

typedef std::unordered_map<int, Peptide> peptide_map;

Note that in case of update, it means creating a new object (default constructor) and then assigning to it. This is not possible here, because assignment means potentially changing the key part of the object.

Matthieu M.
Thank you for the map explanation, but I don't understand your first sentence; don't I indicate which part of the object is the key by the way I define the comparison function?
lashleigh
Yes and no, the comparison function can be arbitrarily complicated and therefore it would be near impossible for a compiler writer to check which part of the object is part of the key and which is not... and of course what's worse is that: `set` is part of the STL which is a library and not the core language itself, so should not elicit particular support from the compiler; and C++ as no syntax for "partial" const-ness. The usual workaround is to make the members not part of the key `mutable` and accessible via `const` methods, but that's more a hack that anything...
Matthieu M.
A: 

std::map will make your life a lot easier and I wouldn't be surprised if it outperforms std::set for this particular case. The storage of the key might seem redundant but can be trivially cheap (ex: pointer to immutable data in Peptide with your own comparison predicate to compare the pointee correctly). With that you don't have to fuss about with the constness of the value associated with a key.

If you can change Peptide's implementation, you can avoid redundancy completely by making Peptide into two separate classes: one for the key part and one for the value associated with the key.