views:

1246

answers:

7

Let's say I have two arrays:

int ArrayA[] = {5, 17, 150, 230, 285};

int ArrayB[] = {7, 11, 57, 110, 230, 250};

Both arrays are sorted and can be any size. I am looking for an efficient algorithm to find if the arrays contain any duplicated elements between them. I just want a true/false answer, I don't care which element is shared or how many.

The naive solution is to loop through each item in ArrayA, and do a binary search for it in ArrayB. I believe this complexity is O(m * log n).

Because both arrays are sorted, it seems like there should be a more efficient algorithm.

I would also like a generic solution that doesn't assume that the arrays hold numbers (i.e. the solution should also work for strings). However, the comparison operators are well defined and both arrays are sorted from least to greatest.

Thanks!

+28  A: 

Pretend that you are doing a mergesort, but don't send the results anywhere. If you get to the end of either source, there is no intersection. Each time you compare the next element of each, if they are equal, there is an intersection.

For example:

counterA = 0;
counterB = 0;
for(;;) {
    if(counterA == ArrayA.length || counterB == ArrayB.length)
        return false;
    else if(ArrayA[counterA] == ArrayB[counterB])
        return true;
    else if(ArrayA[counterA] < ArrayB[counterB])
        counterA++;
    else if(ArrayA[counterA] > ArrayB[counterB])
        counterB++;
    else
        halt_and_catch_fire();
}
Glomek
In case it's not obvious, this solution is O(n)
Frentos
Or is it O(m + n)?
Imbue
BTW, this will work great with C++ iterators for generic code. It makes me think that STL should already provide a solution...
Imbue
I'm inclined to think O(m+n). m and n can be of wildly-different size (e.g. m = n^2), and O(m+n) is equivalent to O(max(m,n)).
Matt J
James Curran
James: It seems to me that you would have the condition at the top of the loop, and the "return false;" far away, after the end of the loop. One thing that I like about this version is having the condition right next to the return.
Glomek
Nice use of the HCF instruction =) (I guess it would be a compiler intrinsic in this case?)
Adam Rosenfield
Actually, I think this algorithm is O(min(m,n)), since it terminates at the conclusion of the shortest list.Also, I'd simplify the final else-if and else clauses into one else clause by relying on the principle of the excluded middle: if A == B and A < B are false, then A > B must be true.
Dov Wasserman
James Curran
@Imbue: it does, you have std::set and std::set_intersection
Jasper Bekkers
After thinking about the problem for only a couple seconds, I came up with the same thing. Only it would have probably been written in Perl.
Brad Gilbert
+2  A: 

If you don't care about memory consumption, you can achieve good performance by using hash, i.e. create hash with keys = values of one array, and test values of second array against this hash

aku
Hash the smaller of the two arrays to save the most memory. This solution will definitely be blazing fast.
Bill the Lizard
I agree. This is how SQL Server performs a hash join...
Mitch Wheat
This is O(n+m), like the accepted solution.
ephemient
A: 

If the range of values is small, you could build a lookup table for one of them (time cost = O(N)) and then check if the bit is set from the other list (time cost = O(N)). If the range is large, you could do something similar with a hash table.

The mergesort trick from Glomek is an even better idea.

Mr Fooz
A: 

Glomek is on the right track, but kinda glossed over the algorithm.

Start by comparing ArrayA[0] to ArrayB[0]. if they are equal, you're done. If ArrayA[0] is less than ArrayB[0], then move to ArrayA[1]. If ArrayA[0] is more than ArrayB[0], then move to ArrayB[1].

Keeping stepping through till you reach the end of one array or find a match.

James Curran
+1  A: 

If you are using C# 3.0 then why not take advantage of LINQ here?

ArrayA.Intersect(ArrayB).Any()

Not only is this generic (works for any comparable type) the implementation under the hood is pretty efficient (uses a hashing algorithm).

JaredPar
+4  A: 

Since someone wondered about stl. Out-of-the-box, the set_intersection algorithm would do more than you want: it would find all the common values.

    #include <vector>
    #include <algorithm>
    #include <iterator>
    using namespace std;
//    ...    
      int ArrayA[] = {5, 17, 150, 230, 285};
      int ArrayB[] = {7, 11, 57, 110, 230, 250};
      vector<int> intersection;
      ThrowWhenWritten output_iterator;
        set_intersection(ArrayA, ArrayA + sizeof(ArrayA)/sizeof(int),
                         ArrayB, ArrayB + sizeof(ArrayB)/sizeof(int),
                         back_insert_iterator<vector<int> >(intersection));

        return !intersection.empty();

this runs in O(m+n) time, but it requires storing all the duplicates and doesn't stop when it finds the first dup.

Now, modifying the code from the gnu implementation of the stl, we can get more precisely what you want.

 template<typename InputIterator1, typename InputIterator2>
 bool 
 has_intersection(InputIterator1 first1, InputIterator1 last1,
          InputIterator2 first2, InputIterator2 last2)
    {
       while (first1 != last1 && first2 != last2) 
       {
          if (*first1 < *first2)
             ++first1;
          else if (*first2 < *first1)
             ++first2;
          else
             return true;
       }
       return false;
}
David Nehme
Nice and simple, though I would not use the names you copied from GNU, a STL implementation is allowed to use these symbols but a POD (Plain Old Developer) isn't allowed to (double underscores and underscore upper case are resolved for the implementation).
Motti
good point, thanks.
David Nehme
+2  A: 

If one list is much much shorter than the other, binary search is the way to go. If the lists are of similar length and you're happy with O(m+n), a standard "merge" would work. There are fancier algorithms that are more flexible. One paper I've come across in my own searches is:

http://www.cs.uwaterloo.ca/~ajsaling/papers/paper-spire.pdf