views:

596

answers:

8

Is there any algorithm using comparisons that check array duplication in O(n) time limit?

I.e., Suppose we have a array of type double. Then I need a function like this

bool has_duplicate(double *arr, int len)

that works in O(n) time in the worst case and checks whether it has to equal elements or not.

Please include a proof, if possible.

+6  A: 

If you use a data structure that doesn't allow duplicates, then you can do it in O(1) time and O(n) space.

But for a given array, you can put each element in a hashtable (that way iterating the array once), and getting your O(n) time. Along the way looking for collisions, O(1) lookup (average) for hash tables. Although, you will be using more space in the process.

edit: Worst case O(n)? Then you don't want this, use a bit-set as the data structure, but the principle is the same. Proof? I think the people here have given you enough information for you to explore what a proof of complexity is and a foundation of the data structures to look at first. I see no reason why you can't solve the rest of this on your own. Ask follow-ups if necessary, but indicate you understand this, and you made some god damn effort. I do not wish you a Merry Xmas or Happy Holiday, good day to you sir.

edit: I SAID GOOD DAY!

nlucaroni
Stole my answer =) The hash table is good, but I like your idea of creating a non-unique list then insert all elements once and track the insert failures. O(n) is considered the same as O(2n), so it's a true O(n) solution.
Kieveli
But you get that by expensive inserts: you can't check for duplicates in less than _O(lg n)_.
Charlie Martin
Oh, and _O(n)_ isn't just _considered_ the same, it is the same. Recall the defn: O(f(n)) means ≤ kf(n) for some k. So O(2n) is ≤ k(2n) which is 2kn; but 2k is also a constant. Define k'=2k and there you have it.
Charlie Martin
Charlie, can you elaborate on "can't check for duplicates in less than O(lg n)"? I seems to me that you can't even do better than O(n) because you have to look at every element.
Jules
Sorry, was unclear. The duplicate check is O(lg n) (binary tree) but you have to build the tree first, which puts us back at O(n lg n) -- lg n compares per element, n times to go through the whole array.
Charlie Martin
@Charlie: There is no binary tree, or any tree. It's a hashtable with O(1) lookup.
recursive
+1  A: 

Only in certain circumstances. For example. If you knew the elements in the array could only be numbers between 0 and 9.

You could create a second array of booleans with 10 slots for these possible 10 values. Mark all the booleans false.

Then you could run once through the array you want to test. If you see a number, mark its position (by setting the value to true) in the second 'possible values' array. But before marking it, check if it's already marked. If it was already marked you've got a duplicate. If you get to the end of the array you've tested and no number was already marked in the second 'possible values' array, there's no duplicate.

So, in the general case the answer is "no". But, if you know the possible values that you can have in the array, and you've got enough memory to store a mark for each of these possible values, then "yes".

Scott Langham
+1  A: 

With an assumption that the value are in limited integer range, you can implement something similar to Bucket Sort to check for duplicate.

int bucket[100] = {0};
int val[] = {1,2,33,10,55,1,45,98};

for(int i = 0 ; i < count ; i++)
    if(++bucket[val[i]] > 1)
        // it's duplicate
m3rLinEz
I repeat that I mean a comparison base algorithm
kvphxga
@kvphxga Whoop, my mistake, sorry :{
m3rLinEz
+2  A: 

If you don't mind giving up some space for a hashset you can get pretty darn close (pseudocode follows):

boolean dupeChecker(values[]){
  set = new hashset();
  for(value : values){
    if(set.contains(value)){
      return true;//there is a dupe
    }
    set.put(value)
  }
  return false;//no dupe
}

You could easily modify this to return the index or indices of duplicate items.

GaryF
A: 

If you must check often but rarely insert, the solution is to sort the array. After sorting, you can find duplicates in O(n) by comparing neighbours (element i and i+1). Use binary search to insert new elements or, if you have a setup stage, insert all elements and then sort them once.

The same applies if you need to compare two arrays: Sort them both. Compare them using two indexes i and j. Advance i if array1[i] < array2[j]. If array1[i] == array2[j], then you have found a common element, else advance j.

If either index runs out of elements, there are no duplicates.

Aaron Digulla
+1  A: 
Charlie Martin
+1  A: 

No      

Scott Langham
A: 

http://aperiodic.net/phil/archives/Geekery/find-duplicate-elements.html

Have fun!

PS: giyf :)

Georg