tags:

views:

2100

answers:

4

I have two STL containers that I want to merge, removing any elements that appear more than once. For example:

typedef std::list<int> container;
container c1;
container c2;

c1.push_back(1);
c1.push_back(2);
c1.push_back(3);

c2.push_back(2);
c2.push_back(3);
c2.push_back(4);

container c3 = unique_merge(c1, c2);
// c3 now contains the following 4 elements:
//   1, 2, 3, 4

std::unique seems to be for adjacent elements only, and in my case the containers could be in any order. I could do some std::set trickery I guess:

container unique_merge(const container& c1, const container& c2)
{
    std::set<container::value_type> s;
    BOOST_FOREACH(const container::value_type& val, c1)
        s.insert(val);
    BOOST_FOREACH(const container::value_type& val, c2)
        s.insert(val);
    return container(s.begin(), s.end());
}

Is there a better way or have I missed something bleeding obvious?

A: 

Can't you use std::merge to merge them and then remove duplicates? It does require the containers to be sorted, though.

Nemanja Trifunovic
The std::set_union algorithm does this already.
Uhall
+4  A: 

For an unordered lists, your set trick is probably one of the best. It each insert should be O(log n), with N inserts required, and traversing will be O(n), giving you O(N*log n). The other option is to run std::sort on each list individually and then walk through them in parallel using std::set_union, which removes duplicates for you. This will also be O(n*log n), so if you're worried about performance, you'll have to profile. If you're not, do whichever makes more sense to you.

Edit: set_union will only work if there are no duplicates in the original lists, otherwise you'll have to go with sort, merge, unique and erase. The big O performance is still the same, with the same caveats about profiling.

template <typename container>
container unique_merge(container c1, container c2)
{
    std::sort(c1.begin(), c1.end());
    std::sort(c2.begin(), c2.end());
    container mergeTarget;
    std::merge(c1.begin(), c1.end(), c2.begin(), c2.end(), 
        std::insert_iterator(mergeTarget, mergeTarget.end())
    );
    std::erase(
        std::unique(mergeTarget.begin(), mergeTarget.end()), 
        mergeTarget.end()
    );

    return mergeTarget;
}
Eclipse
According to the specification for std::set_union: If there are duplicate elements in the two ranges, R1 and R2, say V occurs N times in R1 and M times in R2, the result of std::set_union will contain max(N, M) instances of V. So unless N<=1 and M<=1 it's not a correct solution.
Andreas Magnusson
Your code sorts 2 const containers. That won't even compile.
Chris Morley
That's what I get for not test-compiling it.
Eclipse
+1  A: 

Use the std::set_union algorithm from the STL. You'll need to sort your input lists first though -- or create copies of your input lists, sort them, then use std::set_union.

Uhall
A: 

You are going to need to either sort (either explicitly, or implicitly via a sorted container like set).

There is a common idiom using std::sort/std::unique/std::erase to get unique elements in a container.

So create a container with the contents of c1, append the contents of c2, then sort, move unique elements to the end, and erase them. Something like this:

container c(c1.begin(), c1.end());
c.insert(c.end(), c2.begin(), c2.end());
c.erase(std::unique(c.begin(), c.end()), c.end());
Chris Morley