tags:

views:

151

answers:

3

Hi;

I was using stl::merge to put two sorted collections into one.

But my object has a natural key; and a defined addition semantic, so what I am after is a merge_and_sum that would not just merge the two collections into a single N+M length collection, but if the operator== on the object returned true, would then operator+ them.

I have implemented it thus

template<class _InIt1, class _InIt2, class _OutIt> 
_OutIt merge_and_sum(_InIt1 _First1, _InIt1 _Last1, _InIt2 _First2, _InIt2 _Last2, _OutIt _Dest )
{   // copy merging ranges, both using operator<
    for (; _First1 != _Last1 && _First2 != _Last2; ++_Dest)
    {
        if ( *_First2 < *_First1 )
            *_Dest = *_First2, ++_First2;
        else if ( *_First2 == *_First1)
            *_Dest = *_First2 + *_First1, ++_First1, ++_First2;
        else
            *_Dest = *_First1, ++_First1;
    }
    _Dest = copy(_First1, _Last1, _Dest);   // copy any tail
    return (copy(_First2, _Last2, _Dest));
}

But was wondering if I have reinvented something that is composable from the other algorithms.

Thanks

+2  A: 

It sounds like your collections are like multisets with duplicates collapsed by your + operator (maybe just summing the multiplicities instead of keeping redundant copies). I assume so, because you're not changing the sorting order when you +, so + isn't affecting your key.

You should use your implementation. There's nothing in STL that will do it as efficiently. The closest semantic I can think of is standard merge followed by unique_copy. You could almost get unique_copy to work with a side-effectful comparison operator, but that would be extremely ill advised, as the implementation doesn't promise to only compare things directly vs. via a value-copied temporary (or even a given number of times).

Your type and variable names are unpleasantly long ;)

wrang-wrang
'type and variable names are unpleasantly long' and unrecommended. Names starting with double underscore or an underscore followed by a capital letter are reserved for the implementation (compiler + STL implementor) and names starting with a single underscore followed by a lower case letter are reserved for the implementor for names in the global namespace.
David Rodríguez - dribeas
A: 

Well, your other option would be to use set_symmetric_difference to get the elements that were different, then use set_intersection to get the ones that are the same, but twice. Then add them together and insert into the first.

typedef set<MyType, MyComp> SetType;
SetType merge_and_add(const SetType& s1, const SetType& s2)
{
    SetType diff;
    set_symmetric_difference(s1.begin(), s1.end(), s2.begin(), s2.end(), inserter(s2, s2.end());
    vector<SetType::value_type> same1, same2;
    set_intersection(s1.begin(), s1.end(), s2.begin(), s2.end(), back_inserter(same1));
    set_intersection(s2.begin(), s2.end(), s1.begin(), s1.end(), back_inserter(same2));
    transform(same1.begin(), same1.end(), same2.begin(), inserter(diff, diff.begin()), plus<SetType::value_type, SetType::value_type>());
    return diff;
}

Side note! You should stick to either using operator==, in which case you should use an unordered_set, or you should use operator< for a regular set. A set is required to be partially ordered which means 2 entries are deemed equivalent if !(a < b) && !(b < a). So even if your two objects are unequal by operator==, if they satisfy this condition the set will consider them duplicates. So for your function supplied above I highly recommend refraining from using an == comparison.

rlbond
+1  A: 

You could use std::merge with an output iterator of your own creation, which does the following in operator=. I think this ends up making more calls to operator== than your version, though, so unless it works out as less code it's probably not worth it.

if ((mylist.size() > 0) && (newvalue == mylist.back())) {
    mylist.back() += newvalue;
} else {
    mylist.push_back(newvalue);
}

(Actually, writing a proper output iterator might be more fiddly than that, I can't remember. But I hope you get the general idea).

mylist is a reference to the collection you're merging into. If the target doesn't have back(), then you'll have to buffer one value in the output iterator, and only write it once you see a non-equal value. Then define a flush function on the output iterator to write the last value, and call it at the end. I'm pretty sure that in this case it is too much mess to beat what you've already done.

Steve Jessop