views:

98

answers:

2

I'd like an efficient method for doing the inplace union of a sorted vector with another sorted vector. By inplace, I mean that the algorithm shouldn't create a whole new vector or other storage to store the union, even temporarily. Instead, the first vector should simple grow by exactly the number of new elements.

Something like:

void inplace_union(vector & A, const vector & B);

Where, afterwards, A contains all of the elements of A union B and is sorted.

std::set_union in <algorithm> wont work because it overwrites its destination, which would be A.

Also, can this be done with just one pass over the two vectors?

Edit: elements that are in both A and B should only appear once in A.

+1  A: 

Yes, this can be done in-place, and in O(n) time, assuming both inputs are sorted, and with one pass over both vectors. Here's how:

Extend A (the destination vector) by B.size() - make room for our new elements.

Iterate backwards over the two vectors, starting from the end of B and the original end of A. If the vectors are sorted small → big (big at the end), then take the iterator pointing at the larger number, and stick it in the true end of A. Keep going until B's iterator hits the beginning of B. Reverse iterators should prove especially nice here.

Example:

A: [ 1, 2, 4, 9 ]
B: [ 3, 7, 11 ]

* = iterator, ^ = where we're inserting, _ = unitialized
A: [ 1, 3, 4, 9*, _, _, _^ ]   B: [ 3, 7, 11* ]
A: [ 1, 3, 4, 9*, _, _^, 11 ]  B: [ 3, 7*, 11 ]
A: [ 1, 3, 4*, 9, _^, 9, 11 ]  B: [ 3, 7*, 11 ]
A: [ 1, 3, 4*, 9^, 7, 9, 11 ]  B: [ 3*, 7, 11 ]
A: [ 1, 3*, 4^, 4, 7, 9, 11 ]  B: [ 3*, 7, 11 ]
A: [ 1, 3*^, 3, 4, 7, 9, 11 ]  B: [ 3, 7, 11 ]

Super edit: Have you considered std::inplace_merge? (which I may have just re-invented?)

Thanatos
A union of sets implies that the result set does not have the same element in it twice... right? You just posted the merge algorithm that is the basis for merge sort.
njamesp
You're answer is closer than I thought. If you have a check for the same element being in both (skipping it), you'd be done... except for now you don't know how much longer to make A.
njamesp
The original post talked of a union of vectors. `std::unique` can fix this after a merge. I would still go with `std::inplace_merge` -- it seems to do exactly what my post does, and combined with `std::unique`, exactly what you want, minus all the hard work.
Thanatos
Yes, but then we have used extra memory expanding A to the size of A plus B (when maybe it can be much smaller), and extra work copying elements from B that are already in A (could be a lot).
njamesp
Thanatos
+4  A: 

I believe you can use the algorithm std::inplace_merge. Here is the sample code:

void inplace_union(std::vector<int>& a, const std::vector<int>& b)
{
    int mid = a.size(); //Store the end of first sorted range

    //First copy the second sorted range into the destination vector
    std::copy(b.begin(), b.end(), std::back_inserter(a));

    //Then perform the in place merge on the two sub-sorted ranges.
    std::inplace_merge(a.begin(), a.begin() + mid, a.end());

    //Remove duplicate elements from the sorted vector
    a.erase(std::unique(a.begin(), a.end()), a.end());
}
Naveen
A union of sets implies that the result set does not have the same element in it twice... right?
njamesp
Yes.. as you can not have duplicate elements in a set. But in this case since we are sorted vectors the duplicates are allowed, however I am not sure whether the relative position of the similar objects is preserved in `inplace_merge` i.e. whether it is *stable* or not.
Naveen
You can remove the duplicates, if applicable, with `std::unique`
Thanatos
I am not worried about stability.
njamesp
Replacing `std::copy` with `std::set_difference` would remove the need for `std::unique`.
njamesp
@user438601: That's why I am not storing an iterator to the end. I am just storing an index to the end. `mid` is a simple `int` not an `iterator`
Naveen
Right, my bad. And since vectors have random access iterators, the + mid is O(1), so I think you've got it if you change to `std::set_difference`.
njamesp
Are you sure? `std::set_difference` would not only remove the duplicates, but the first instance too! (the difference of `[1, 2, 3]` and `[1, 2, 3]` is `[]`) *Edit:* Actually, I think I'm probably wrong here...
Thanatos
Yeah, I am wrong. You want to copy B, minus the elements in A, or B - A, which is what `set_difference` does.
Thanatos
@Naveen: Does this solution meet the requirement " Instead, the first vector should simple grow by exactly the number of new elements.". In this solution, we created more space and then removed duplicates.
Chubsdad
@chubsdad: Right now no, the requirement for having the unique elements was added later. Before that requirement, that condition was satisfied.
Naveen