views:

36

answers:

3

What's the return of the algorithm std:set_union when one or both input containers are multisets with duplicated objects? Do dups get lost?

Let's suppose for example:

multiset<int> ms1;
ms1.insert(1);
ms1.insert(1);
ms1.insert(1);
ms1.insert(2);
ms1.insert(3);

multiset<int> ms2;
ms2.insert(1);
ms2.insert(1);
ms2.insert(2);
ms2.insert(2);
ms2.insert(4);

vector<int> v(10);
set_union( ms1.begin(), ms1.end(), ms2.begin(), ms2.end(), v.begin() );

What would the output be?

+1  A: 

From the documentation of std::set_union (emphasis added).

In the simplest case, set_union performs the "union" operation from set theory: the output range contains a copy of every element that is contained in [first1, last1), [first2, last2), or both. The general case is more complicated, because the input ranges may contain duplicate elements. The generalization is that if a value appears m times in [first1, last1) and n times in [first2, last2) (where m or n may be zero), then it appears max(m,n) times in the output range. [1] Set_union is stable, meaning both that the relative order of elements within each input range is preserved, and that if an element is present in both input ranges it is copied from the first range rather than the second.

It will appear max(m,n) times where m is the number of times it occurs in ms1 and n is the number of times it occurs in ms2.

Stephen
+1  A: 

From the standard, 25.3.5:

The semantics of the set operations are generalised to multisets in a standard way by defining union() to contain the maximum number of occurrences of every element, intersection() to contain the minimum, and so on.

So in your example, the result will be (1,1,1,2,2,3,4,0,0,0), since you initialised the vector with length 10.

Mike Seymour
+1  A: 

From the C++ standard, §25.3.5/1:

This section defines all the basic set operations on sorted structures. They also work with multisets (23.3.4) containing multiple copies of equivalent elements. The semantics of the set operations are generalized to multisets in a standard way by defining union() to contain the maximum number of occurrences of every element, intersection() to contain the minimum, and so on.

Jerry Coffin