tags:

views:

98

answers:

5

If I have N arrays, what is the best(Time complexity. Space is not important) way to find the common elements. You could just find 1 element and stop.

Edit: The elements are all Numbers.

Edit: These are unsorted. Please do not sort and scan.

This is not a homework problem. Somebody asked me this question a long time ago. He was using a hash to solve the problem and asked me if I had a better way.

A: 

The most direct method is to intersect the first 2 arrays and then intersecting this intersection with the remaining N-2 arrays.

If 'intersection' is not defined in the language in which you're working or you require a more specific answer (ie you need the answer to 'how do you do the intersection') then modify your question as such.

Without sorting there isn't an optimized way to do this based on the information given. (ie sorting and positioning all elements relatively to each other then iterating over the length of the arrays checking for defined elements in all the arrays at once)

Matt S
Why the downvote?
fbrereto
Downvote because 1. Intersection is the problem here. Telling me to use the intersection provided by the library is not an answer. 2. He is saying that there is no other way to solve without sorting, when I have clearly shown one way you could do that.
kunjaan
I said there wasn't an OPTIMIZED way to solve it without sorting. Quite obviously (as I answered it without the sorting condition) there are ways to solve without sorting.
Matt S
A: 

I'd first start with the degenerate case, finding common elements between 2 arrays (more on this later). From there I'll have a collection of common values which I will use as an array itself and compare it against the next array. This check would be performed N-1 times or until the "carry" array of common elements drops to size 0.

One could speed this up, I'd imagine, by divide-and-conquer, splitting the N arrays into the end nodes of a tree. The next level up the tree is N/2 common element arrays, and so forth and so on until you have an array at the top that is either filled or not. In either case, you'd have your answer.

Without sorting and scanning the best operational speed you'll get for comparing 2 arrays for common elements is O(N2).

fbrereto
+2  A: 

Create a hash index, with elements as keys, counts as values. Loop through all values and update the count in the index. Afterwards, run through the index and check which elements have count = N. Looking up an element in the index should be O(1), combined with looping through all M elements should be O(M).

If you want to keep order specific to a certain input array, loop over that array and test the element counts in the index in that order.

Some special cases:

if you know that the elements are (positive) integers with a maximum number that is not too high, you could just use a normal array as "hash" index to keep counts, where the number are just the array index.

I've assumed that in each array each number occurs only once. Adapting it for more occurrences should be easy (set the i-th bit in the count for the i-th array, or only update if the current element count == i-1).

EDIT when I answered the question, the question did not have the part of "a better way" than hashing in it.

catchmeifyoutry
Ah......clever!
fbrereto
Variation of counting sort...pretty good.
kunjaan
The final loop can be 'optimized' by looping over the smallest of the N arrays checking the counts. Otherwise on the last array during the 'insertions into the hash' if the count for the element in the hash is == (N-1) the element can instead be added to a common array which at the end will be the final list of common elements. Lastly, you coudl avoid incrementing counters after the first array ingestion when hash[ele] < (currentArrayNum - 1) which after removal of hash[ele] = 1 entries will leave you with eles that are common to all.
Matt S
yup, that could be done, I've noted the same condition in the last sentence. But without that check you get the same behavior, and I don't see the benefit of a test over an increment, both should be atomic operations. Without the test though the index has even added use as an occurrence counter, for free! So I don't see the point of crippling it.
catchmeifyoutry
Just to nit-pick: Looking up an element in a hash-table is actually O(n) (worst-case), though with a perfectly-distributed hash-function it will be O(n/h) average, where h is the size of the hash-table.
BlueRaja - Danny Pflughoeft
In many languages the 'hash' he's referring to here could be done as an associative array. This means the Integer in the original N Arrays is the key (thus collisions cause the counter to increment) which is constant time access, there's no scan. In Java for example the use of a HashMap for put/get is in fact constant time, again no scan.
Matt S
A: 

The question asks is there a better way than hashing. There is no better way (i.e. better time complexity) than doing a hash as time to hash each element is typically constant. Empirical performance is also favorable particularly if the range of values is can be mapped one to one to an array maintaining counts. The time is then proportional to the number of elements across all the arrays. Sorting will not give better complexity, since this will still need to visit each element at least once, and then there is the log N for sorting each array.

Back to hashing, from a performance standpoint, you will get the best empirical performance by not processing each array fully, but processing only a block of elements from each array before proceeding onto the next array. This will take advantage of the CPU cache. It also results in fewer elements being hashed in favorable cases when common elements appear in the same regions of the array (e.g. common elements at the start of all arrays.) Worst case behaviour is no worse than hashing each array in full - merely that all elements are hashed.

mdma
The original question did NOT ask if there was a better way than hashing. THe original question merely asked 'how do you do it' and the most popular answer was hashing. THe comment about 'a friend said hashing' was added after the the answer was posted.
Matt S
Quote: "Somebody asked me this question a long time ago. He was using a hash to solve the problem and asked me if I had a better way.". And what's with the attitude - there's no need to shout out - we're all intelligent individuals who can read what you write without undue emphasis.
mdma
Ok, so the original question didn't ask this. But when I saw the question that is what I saw. There's little point flaming me for changes to the OP.
mdma
A: 
function commonElement(array of n arrays input)
  head <- shift(input)
  for i <- 0..length(head)
    search <- head[i]
    for j <- 0..length(input)
      found <- false
      for k <- 0..length(input[j])
        if input[j][k] = search
          found <- true
      if found = false
        break
    if found = true
      return search
    else
      return failure

I think the above pseudocode should do what you need. It should run in O(nm^2) where n is the number of arrays and m is the length of the array.

Jakub Hampl