tags:

views:

106

answers:

4

In an array with integers between 1 and 1,000,000 or say some very larger value ,if a single value is occurring twice twice. How do you determine which one?

I think we can use a bitmap to mark the elements , and then traverse allover again to find out the repeated element . But , i think it is a process with high complexity.Is there any better way ?

+1  A: 

This sounds like homework or an interview question ... so rather than giving away the answer, here's a hint.

What calculations can you do on a range of integers whose answer you can determine ahead of time?

Once you realize the answer to this, you should be able to figure it out .... if you still can't figure it out ... (and it's not homework) I'll post the solution :)

EDIT: Ok. So here's the elegant solution ... if the list contains ALL of the integers within the range.

We know that all of the values between 1 and N must exist in the list. Using Guass' formula we can quickly compute the expected value of a range of integers:

Sum(1..N) = 1/2 * (1 + N) * Count(1..N).

Since we know the expected sum, all we have to do is loop through all the values and sum their values. The different between this sum and the expected sum is the duplicate value.

EDIT: As other's have commented, the question doesn't state that the range contains all of the integers ... in this case, you have to decide whether you want to optimize for memory or time.

If you want to perform the operation using O(1) storage, you can perform an in-place sort of the list. As you're sorting you have to check adjacent elements. Once you see a duplicate, you know you can stop. Optimal sorting is an O(n log n) operation on average - which establishes an upper bound for find the duplicate in this manner.

If you want to optimize for speed, you can use an additional O(n) storage. Using a HashSet (or similar structure), insert values from your list until you determine you are inserting a duplicate into the HashSet. Inserting n items into a HashSet is an O(n) operation on average, which establishes that as an upper bound for this method.

LBushkin
Dividing by zero? :)
belisarius
@LBushkin: now you've done Gauss's classwork for him ! Though I think you could simplify by replacing Count(1..N) with N.
High Performance Mark
@belisarius: Where do you believe there would be a division by zero?
LBushkin
Thats the solution to a standard problem, but this question doesn't seem to specify that -all- the integers between 1 and N are present, so its not quite the answer to this question.
Aaron
-1, at least hint how to solve what was asked.
IVlad
@LBushkin It was intended as a joke ... perhaps a bad one (as an operation whose result can be determined ahead of time). BTW I remember doing the same trick with BitXor (also biyective) instead of SUM many years ago
belisarius
@IVlad: If you look at my answer, you'll notice that I did ...
LBushkin
@LBushkin - no, you didn't. You don't know that all the values between 1 and `N` (assuming `N` is the size of the list) must exist in the list. You could have a list of 4 elements: 100 200 5000 5000. Then the repeated element is 5000. All you know is the array contains integers between 1 and 1 000 000 and that one of them appears twice.
IVlad
-1: Why are you assuming _all_ integers between 1 and N appear? All the problem stated was the upper and lower bounds for the integers in the array. Edit: I see IVlad just said the same thing!
Moron
@belisarius: Yes XOR can be used. There is an O(1) time algorithm to compute 1 XOR 2 XOR 3 .... XOR n.
Moron
@Moron, @IVlad: I've updated my answer to reflect the points you've made.
LBushkin
Changed -1 to +1, that answers the question.
IVlad
+1  A: 

you may try to use bits as hashmap:

1 at position k means that number k occured before

0 at position k means that number k did not occured before

pseudocode:

0. assume that your array is A
1. initialize bitarray(there is nice class in c# for this) of 1000000 length filled with zeros
2. for each num in A:
   if bitarray[num] 
      return num
   else
      bitarray[num] = 1
   end
dfens
-1: @dfens, @LBushkin provides a much better answer: more elegant, likely to be much quicker.
High Performance Mark
+1: btw, LBushkin's answer answers the wrong question.
Moron
+1  A: 

The time complexity of the bitmap solution is O(n) and it doesn't seem like you could do better than that. However it will take up a lot of memory for a generic list of numbers. Sorting the numbers is an obvious way to detect duplicates and doesn't require extra space if you don't mind the current order changing.

jjrv
Sorting, of course, is O(n log n).
David Thornley
A: 

Assuming the array is of length n < N (i.e. not ALL integers are present -- in this case LBushkin's trick is the answer to this homework problem), there is no way to solve this problem using less than O(n) memory using an algorithm that just takes a single pass through the array. This is by reduction to the set disjointness problem.

Suppose I made the problem easier, and I promised you that the duplicate elements were in the array such that the first one was in the first n/2 elements, and the second one was in the last n/2 elements. Now we can think of playing a game in which two people each hold a string of n/2 elements, and want to know how many messages they have to send to be sure that none of their elements are the same. Since the first player could simulate the run of any algorithm that takes a pass through the array, and send the contents of its memory to the second player, a lower bound on the number of messages they need to send implies a lower bound on the memory requirements of any algorithm.

But its easy to see in this simple game that they need to send n/2 messages to be sure that they don't hold any of the same elements, which yields the lower bound.

Edit: This generalizes to show that for algorithms that make k passes through the array and use memory m, that m*k = Omega(n). And it is easy to see that you can in fact trade off memory for time in this way.

Of course, if you are willing to use algorithms that don't simply take passes through the array, you can do better as suggested already: sort the array, then take 1 pass through. This takes time O(nlogn) and space O(1). But note curiously that this proves that any sorting algorithm that just makes passes through the array must take time Omega(n^2)! Sorting algorithms that break the n^2 bound must make random accesses.

Aaron