ansaurus

Question

Curious question : What algorithm does STL set_intersect implement?

Answer 1

+3 A:

STL doesn't require any particular algorithm, it just sets constraints on the algorithmic complexity of certain operations. Since it's all template based, you can easily view the source to your particular implementation to see how it works.

Mark Ransom 2010-10-27 04:56:31

+1. I was starting to write an answer, and copied this, but your answer looks quite good, so I'll paste it here instead. The specific requirement is: "Complexity: At most 2 * ((last1 - first1) + (last2 - first2)) - 1 comparisons."

Jerry Coffin 2010-10-27 05:00:25

Easily view it, sure. Understanding how it works before your head explodes? Now that's a different story.

PigBen 2010-10-27 05:00:28

@PigBen, if you're curious about these kinds of things I assume your head explosion threshold is a bit higher than most. My own threshold is a lot lower than I'd like.

Mark Ransom 2010-10-27 05:03:19

Answer 2

+1 A:

At least in the implementations I've looked at, the implementation is fairly simplistic -- something on this general order:

template <class inIt, class outIt>
outIt set_intersection(inIt start1, inIt end1, inIt start2, inIt end2, outIt out) {
    while (start1 != end1 && start2 != end2) {
       if (*start1 < *start2)
           ++start1;
       else if (*start2 < *start1)
           ++start2;
       else {                 // equal elements.
           *out++ = *start1;
           ++start1;
           ++start2;
       }
    }
    return out;
}

Of course, I'm just typing this off the top of my head -- it probably won't even compile, and certainly isn't pedantically correct (e.g., should probably use a comparator function instead of using operator< directly, and should have another template parameter to allow start1/end1 to be a different type from start2/end2).

From an algorithmic viewpoint, however, I'd guess most real implementations are pretty much as above.

Jerry Coffin 2010-10-27 05:23:46

Answer 3

A:

Interesting. So, the number of comparisons in your algorithm linearly scales with the number of elements in both sets. The Baeza-Yates algorithm goes something like this (note that it assumes both input sets are sorted) :

1) Find the median of set A (A is the smaller set here) 2) Search for the median of A in B. If found, add to the result else, the insertion rank of the median in B is known. 3) Split set A about its median into two parts, and set B about its insertion rank into two parts, and repeat the procedure recursively on both parts. This step works because all elements less than the median in A would intersect only with those elements before the insertion rank of A's median in B.

Since you can use a binary search to locate A's median in B, clearly, the number of comparisons in the this algorithm is lower than the one you mentioned. In fact, in the "best" case, the number of comparisons is O(log(m) * log(n)), where m and n are the sizes of the sets, and in the worst case, the number of comparisons is O(m + n). How on earth did I mess up the implementation this bad? :(

darthshak 2010-10-28 18:22:29

ansaurus

tags:

views:

answers:

Curious question : What algorithm does STL set_intersect implement?

related questions