Given two lists (not necessarily sorted), what is the most efficient non-recursive algorithm to find the intersection of those lists?
You could put all elements of the first list into a hash set. Then, iterate the second one and, for each of its elements, check the hash to see if it exists in the first list. If so, output it as an element of the intersection.
First, sort both lists using quicksort : O(n*log(n). Then, compare the lists by browsing the lowest values first, and add the common values. For example, in lua) :
function findIntersection(l1, l2)
i, j = 1,1
intersect = {}
while i < #l1 and j < #l2 do
if l1[i] == l2[i] then
i, j = i + 1, j + 1
table.insert(intersect, l1[i])
else if l1[i] > l2[j] then
l1, l2 = l2, l1
i, j = j, i
else
i = i + 1
end
end
return intersect
end
which is O(max(n, m))
where n
and m
are the sizes of the lists.
EDIT: quicksort is recursive, as said in the comments, but it looks like there are non-recursive implementations
without hashing, I suppose you have two options:
- The naive way is going to be compare each element to every other element. O(n^2)
- Another way would be to sort the lists first, then iterate over them: O(n lg n) * 2 + 2 * O(n)
I got some good answers from this that you may be able to apply. I haven't got a chance to try them yet, but since they also cover intersections, you may find them useful.
Why not implement your own simple hash table or hash set? It's worth it to avoid nlogn intersection if your lists are large as you say.
Since you know a bit about your data beforehand, you should be able to choose a good hash function.
If there is a support for sets (as you call them in the title) as built-in usually there is a intersection method.
Anyway, as someone said you could do it easily (I will not post code, someone already did so) if you have the lists sorted. If you can't use recursion there is no problem. There are quick sort recursion-less implementations.
I second the "sets" idea. In JavaScript, you could use the first list to populate an object, using the list elements as names. Then you use the list elements from the second list and see if those properties exist.
In PHP, something like
function intersect($X) { // X is an array of arrays; returns intersection of all the arrays
$counts = Array(); $result = Array();
foreach ($X AS $x) {
foreach ($x AS $y) { $counts[$y]++; }
}
foreach ($counts AS $x => $count) {
if ($count == count($X)) { $result[] = $x; }
}
return $result;
}
in C++ the following can be tried using STL map
vector<int> set_intersection(vector<int> s1, vector<int> s2){
vector<int> ret;
map<int, bool> store;
for(int i=0; i < s1.size(); i++){
store[s1[i]] = true;
}
for(int i=0; i < s1.size(); i++){
if(store[s2[i]] == true) ret.push_back(s2[i]);
}
return ret;
}
You might want to take a look at Bloom filters. They are bit vectors that give a probabilistic answer whether an element is a member of a set. Set intersection can be implemented with a simple bitwise AND operation. If you have a large number of null intersections, the Bloom filter can help you eliminate those quickly. You'll still have to resort to one of the other algorithms mentioned here to compute the actual intersection, however. http://en.wikipedia.org/wiki/Bloom_filter
From the eviews features list it seems that it supports complex merges and joins (if this is 'join' as in DB terminology, it will compute an intersection). Now dig through your documentation :-)
Additionally, eviews has their own user forum - why not ask there_