I see that set_intersection()
et al. from the algorithm
header won't work as they explicitly require their inputs to be sorted -- guess you ruled them out already.
It occurs to me that the "naive" approach of iterating through hash A and looking up every element in hash B should actually give you near-optimal performance, since successive lookups in hash B will be going to the same hash bucket (assuming that both hashes are using the same hash function). That should give you decent memory locality, even though these buckets are almost certainly implemented as linked lists.
Here's some code for unordered_set_difference()
, you can tweak it to make the versions for set union and set difference:
template <typename InIt1, typename InIt2, typename OutIt>
OutIt unordered_set_intersection(InIt1 b1, InIt1 e1, InIt2 b2, InIt2 e2, OutIt out) {
while (!(b1 == e1)) {
if (!(std::find(b2, e2, *b1) == e2)) {
*out = *b1;
++out;
}
++b1;
}
return out;
}
Assuming you have two unordered_set
s, x
and y
, you can put their intersection in z
using:
unordered_set_intersection(
x.begin(), x.end(),
y.begin(), y.end(),
inserter(z, z.begin())
);
Unlike bdonlan's answer, this will actually work for any key types, and any combination of container types (although using set_intersection()
will of course be faster if the source containers are sorted).
NOTE: If bucket occupancies are high, it's probably faster to copy each hash into a vector
, sort them and set_intersection()
them there, since searching within a bucket containing n elements is O(n).