views:

442

answers:

4

I know the STL has set_difference, but I need to just know if 2 sets are disjoint. I've profiled my code and this is slowing my app down quite a bit. Is there an easy way to see if 2 sets are disjoint, or do I need to just roll my own code?

EDIT: I also tried set_intersection but it took the same time...

+3  A: 

You could use set_intersection and test if the resulting set is empty, but I don't know if that's much faster.

The optimal implementation would stop the testing and return false as soon as the first equal element is found. I don't know of any ready-made solution for that, though

template<class Set1, class Set2> 
bool is_disjoint(const Set1 &set1, const Set2 &set2)
{
    Set1::const_iterator it, itEnd = set1.end();
    for (it = set1.begin(); it != itEnd; ++it)
        if (set2.count(*it))
            return false;

    return true;
}

isn't too complex and should do the job nicely.

EDIT: If you want O(n) performance, use the slightly less compact

template<class Set1, class Set2> 
bool is_disjoint(const Set1 &set1, const Set2 &set2)
{
    Set1::const_iterator it1 = set1.begin(), it1End = set1.end();
    if (it1 == it1End)
        return true; // first set empty => sets are disjoint

    Set2::const_iterator it2 = set2.begin(), it2End = set2.end();
    if (it2 == it2End)
        return true; // second set empty => sets are disjoint

    // first optimization: check if sets overlap (with O(1) complexity)
    Set1::const_iterator it1Last = it1End;
    if (*--it1Last < *it2)
        return true; // all elements in set1 < all elements in set2
    Set2::const_iterator it2Last = it2End;
    if (*--it2Last < *it1)
        return true; // all elements in set2 < all elements in set1

    // second optimization: begin scanning at the intersection point of the sets    
    it1 = set1.lower_bound(*it2);
    if (it1 == it1End)
        return true;
    it2 = set2.lower_bound(*it1);
    if (it2 == it2End)
        return true;

    // scan the (remaining part of the) sets (with O(n) complexity) 
    for(;;)
    {
        if (*it1 < *it2)
        {
            if (++it1 == it1End)
                return true;
        } 
        else if (*it2 < *it1)
        {
            if (++it2 == it2End)
                return true;
        }
        else
            return false;
    }
}

(modified Graphics Noob's modification further, using only operator <)

hjhill
Actually, your implementation of is_disjoint is too complex (in the O(n) sense) - each iteration over set1 does an O(log(n)) lookup for the element in set2, giving a total runtime of O(nlog(n)), or O(nlog(n))/2 on average. This is possible in O(n) time.
Terry Mahaffey
How understandable something is is subjective. How it performs is not.
Terry Mahaffey
@Terry Mahaffey - yes, provided the iteration of both sets is possible in O(1) per step (I have to think about that, sets are usually implemented as some kind of tree structure). The code would be less easy to understand though...
hjhill
@Terry Mahaffey: It depends. Scanning both sets is O(n+m). Scanning only the first list, and probing the second, is O(n log m), which will be faster if n is small and m is large.
Jason Orendorff
+1  A: 

Use std::set_intersection, and see if the output is empty. You could check to see if the range of the two sets (area covered by the begin and end iterators) overlap first, but I suspect that set_intersection might already do this.

These are all O(n) operations, as is is_disjoint. If O(n) is unacceptable, you'll have to build some side storage to "keep track" of the disjoint-ness of the sets as you add/remove elements. Here, you would pay the overhead at insertion time (by updating your "isdisjoint" structure for each set modification), but is_disjoint could be implemented cheaply. This may or may not be a good tradeoff to make.

Terry Mahaffey
@Xavier Nodet - the intersection of two sets contains only the elements contained in both sets, so it can't be larger than the smallest of the two sets.
hjhill
No, I do not mean that. That would be the correct test if you're using std::set_union
Terry Mahaffey
I think `set_intersect` can't perform the 'overlap' test because the inputs to `set_intersect` are input iterators and not bidirectional iterators.
hjhill
+2  A: 
Nikolai N Fetissov
Except, the sets aren't continuous, or in one dimension.
rlbond
Nope - what about {1, 3, 5} and {2, 4, 6}? Clearly disjoint, but with intersecting boundaries...
hjhill
Oh, forgot the 'possibly' qualifier :)
Nikolai N Fetissov
Hmm, @rlbond, what do you mean by not in one dimension? Set elements have to be in a relative order imposed by the less-then operation.
Nikolai N Fetissov
The sets are of 2d points and are sorted lexicographically. So VERY often, you'll have something like set1 = {(1,1),(1,2),(2,1),(2,2)}, set2 = {(1,20),(1,21),(1,22)}, which of course don't "overlap" in 2d but when sorted lexicographically, which is a strict ordering, set2 is completely in set1's boundaries.
rlbond
+7  A: 

Modified hjhill's code to reduce complexity by a factor of O(log n) by getting rid of the count() call.

template<class Set1, class Set2> 
bool is_disjoint(const Set1 &set1, const Set2 &set2)
{
    if(set1.empty() || set2.empty()) return true;

    typename Set1::const_iterator 
        it1 = set1.begin(), 
        it1End = set1.end();
    typename Set2::const_iterator 
        it2 = set2.begin(), 
        it2End = set2.end();

    if(*it1 > *set2.rbegin() || *it2 > *set1.rbegin()) return true;

    while(it1 != it1End && it2 != it2End)
    {
        if(*it1 == *it2) return false;
        if(*it1 < *it2) { it1++; }
        else { it2++; }
    }

    return true;
}

I've complied and tested this code now so it should be good.

Graphics Noob
Yes, much better. But you also should optimize for when the sets are obviously disjoint (the .begin of one is greater than the .end of the other) to yield O(1) performance in this case.
Terry Mahaffey
You forgot to initialize the iterators to setN.begin()
hjhill
good catches, updated
Graphics Noob
*it1End and *it2End are undefined - the iterators returned by end() are past-the-end iterators and therefore not dereferenceable.
hjhill
Your "overlap" test contains two more errors: 1. What if one of the sets is empty (*rbegin() is undefined in this case), 2. It should return `true` for the non-overlapping case (because then the two sets **are** disjoint).
hjhill
Line 13, if(*it > *set2.rbegin() || *it2 > *set1.rbegin()) return true;should be if (*it1 > *set2...) ... ?
Yawar
Well, this answers my question -- I guess there's no "easy" way to do this.
rlbond