views:

61

answers:

5

I am thinking of sorting and then doing binary search. Is that the best way?

+1  A: 

Define "best".

If you want to do it fast, you can do it O(n) by iterating through each array and keeping a count for each unique element. Details of how to count the unique elements depend on the alphabet of things that can be in the array, eg, is it sparse or dense?

Note that this is O(n) in the number of arrays, but O(nm) for arrays of length m).

Charlie Martin
+1  A: 

Iterate through each one and use a hash table to store counts. The key is the value of the integer and the value is the count of appearances.

anonymous_21321
+2  A: 

I advocate for hashes in such cases: you'll have time proportional to common size of both arrays.
Since most major languages offer hashtable in their standard libraries, I hardly need to show your how to implement such solution.

Nikita Rybak
+1  A: 

It depends. If one set is substantially smaller than the other, or for some other reason you expect the intersection to be quite sparse, then a binary search may be justified. Otherwise, it's probably easiest to step through both at once. If the current element in one is smaller than in the other, advance to the next item in that array. When/if you get to equal elements, you send that as output, and advance to the next item in both arrays. (This assumes, that as you advocated, you've already sorted both, of course).

This is an O(N+M) operation, where N is the size of one array, and M the size of the other. Using a binary search, you get O(N lg2 M) instead, which can be lower complexity if one array is lot smaller than the other, but is likely to be a net loss if they're close to the same size.

Depending on what you need/want, the versions that attempt to just count occurrences can cause a pretty substantial problem: if there are multiple occurrences of a single item in one array, they will still count that as two occurrences of that item, indicating an intersection that doesn't really exist. You can prevent this, but doing so renders the job somewhat less trivial -- you insert items from one array into your hash table, but always set the count to 1. When that's finished, you process the second array by setting the count to 2 if and only if the item is already present in the table.

Jerry Coffin
A: 

The best way is probably to hash all the values and keep a count of occurrences, culling all that have not occurred i times when you examine array i where i = {1, 2, ..., n}. Unfortunately, no deterministic algorithm can get you less than an O(n*m) running time, since it's impossible to do this without examining all the values in all the arrays if they're unsorted.

A faster algorithm would need to either have an acceptable level of probability (Monte Carlo), or rely on some known condition of the lists to examine only a subset of elements (i.e. you only care about elements that have occurred in all i-1 previous lists when considering the ith list, but in an unsorted list it's non-trivial to search for elements.

nearlymonolith