views:

332

answers:

8

Size of an array is n.All elements in the array are distinct in the range of [0 , n-1] except two elements.Find out repeated element without using extra temporary array with constant time complexity.

I tried with o(n) like this.

   a[]={1,0,0,2,3};
    b[]={-1,-1,-1,-1,-1};
    i=0;
    int required;
    while(i<n)
    {
      b[a[i]]++;
      if(b[a[i]==1)
        required=a[i];
    }
    print required;

If there is no constraint on range of numbers i.e allowing out of range also.Is it possible get o(n) solution without temporary array.

+2  A: 
  1. Look what is first and last number
  2. Calculate SUM(1) of array elements without duplicate (like you know that sum of 1...5 = 1+2+3+4+5 = 15. Call it SUM(1)). As AaronMcSmooth pointed out, the formula is Sum(1, n) = (n+1)n/2.
  3. Calculate SUM(2) of the elements in array that is given to you.
  4. Subtract SUM(2) - SUM(1). Whoa! The result is the duplicate number (like if a given array is 1, 2, 3, 4, 5, 3, the SUM(2) will be 18. 18 - 15 = 3. So 3 is a duplicate).

Good luck coding!

Vlad Lazarenko
This isn't O(1), as it requires you to iterate over not one, but two arrays. Also, having an array without a duplicate counts as having a temporary array.
Rafe Kettler
@rafe Kettler, you can use the identity Sum(1, n) = (n+1)n/2 for step 2
aaronasterling
I hope you'd optimize step 2... ;-) You should also note that the sum should be unsigned, and it doesn't matter if it wraps modulo `UINT_MAX+1`. Otherwise, the sum of `n` numbers if `O(n)` in space.
R..
@Aaron: I'm pretty sure OP was incorrect about `O(1)` time and really wanted `O(1)` space. Naively, this sum is `O(n)` in space, but it would be just as good to known the sum modulo `M` where `M>=n` since the difference modulo `M` will be sufficient to determine which number was duplicate. Thankfully, a convenient value for `M` is `UINT_MAX+1`. :-)
R..
How do you check for dups in step 2 without a temp array?
sje397
@R, that's a good trick. I don't see why it's needed though. Why is accumulating the sum while looping over the array O(n) in space? It seems like it would only require one `unsigned int` and an index for any size array (up until `UINT_MAX+1` would not suffice either. Is there a way to sum that's more naive than that?
aaronasterling
@Rafe: that is not a temporary array. Summing the numbers 1 through `n-1` does not require first putting them in an array, even if you forget the formula to do it.
R..
Incorrect even ignoring the complexity. The array has n elements with n-1 out of n different values. This means that not only is one of the values from 0 to n-1 duplicated, but a different value in that range is *missing*. But the problem can still be solved with O(1) space (not counting the input) and O(n) time.
aschepler
@Aaron: It's not quite as bad as I stated, especially since the integers being summed are in the range `0` to `n`. But in general, The sum of `n` fixed-size integers is `O(log n)` (not `O(n)` as I previously wrote) in space.
R..
@Vlad Lazarenko:consider the when element 5 is repeated.Does your logic works ?
Jagan
@aschepler: You might be onto something. I read the problem as the values being taken from the range 1 through `n-1`, but indeed it says 0 through `n-1`. It seems like there should be a missing element in addition to a duplicated element.
R..
@Jagan: Yes. `1+2+3+4+5=15. 1+2+3+4+5+5=20. 20-15=5. 5 is a duplicate.`
Vlad Lazarenko
@Vlad: If `n=6`, then the values are taken from the range 0 to 5. What if another value had been omitted instead of the 0 being omitted?
R..
@Vlad Lazarenko:1+2+3+4+5=15. 1+1+2+3+4=11. 15-11=4. Is 4 duplicate ?
Jagan
@Jagan: You are breaking 'distinct range' requirement here.
Vlad Lazarenko
@R.. Good point. 0 can mess up stuff here. Don't know how to deal with that yet. Gotta work tomorrow so I go to bed.
Vlad Lazarenko
@Jagan: With Vlad's algorithm, S1=1+2+3+4=10 and S2=1+1+2+3+4=11. S2-S1=1. However it only works because the omitted element was 0.
R..
@R.. OK, the final idea is that if ARRAY[ABS(SUM2-SUM1)] == 0, then answer is 0, otherwise ABS(SUM2-SUM1). Now I can sleep :)
Vlad Lazarenko
In general, S2-S1 is equal to the duplicated element minus the missing element. I don't see how knowing this difference helps you quickly find the value of the duplicated element, however.
R..
@Vlad: 99% sure that's wrong.
R..
@Vlad Lazarenko: No. The approach you suggested does not work, regardless of how you spin it. One value `A` is missing, another value `B` is duplicated (basically, substituted instead of `A`). That will increase the sum by `B - A`. That's what you find by your algorithm: the `B - A` value. However, knowing `B - A` is still not enough to determine neither `B` nor `A`. Knowing that the sum has increased by 2 would mean that 5 was replaced by 7 (7 is duplicated), or that 1 was replaced by 3 (3 is duplicated) or lots of other possibilities.
AndreyT
(By "increase" I mead "change": the value of `B - A` can be negative, of course).
AndreyT
See my answer. It's based on Vlad's, but fixing up the problem that you only know the difference of the missing and duplicate numbers and not their values.
R..
A: 

Lazy solution: Put the elements to java.util.Set one by one by add(E) until getting add(E)==false.

Sorry no constant-time. HashMap:O(N), TreeSet:O(lgN * N).

卢声远 Shengyuan Lu
That might not be O(1)
Utkarsh Sinha
O(n) (assuming `java.util.Set` has O(1) cost.)
aaronasterling
@AaronMcSmooth - I don't see how it could be O(1)
sje397
@sje, I was a little sloppy, O(1) insert cost. Python's sets and dicts have O(1) for average case so it's certainly possible. And I'm referring to time complexity because that's what OP originally said. It's obvious that this breaks the requirement for O(1) additional space.
aaronasterling
Depends on whether the Set implementation is a HashSet or a TreeSet. With a TreeSet, you have O(lg n) insert cost, but presumably here you would use a HashSet, which should have O(1) amortized insert cost, if tell it the expected capacity (which you know) and possibly the load factor (though the default likely will suffice)
James
@AaronMcSmooth O(1) per element, but you have to insert `n` elements.
Mark B
@Mark which means O(n) which is exactly what I said.
aaronasterling
Only Java fans would really *get into* discussing data structures that are nowhere near `O(1)` space as part of an in-place-computation puzzle...
R..
@James: you are right, it should be depended on whether the Set implementation.
卢声远 Shengyuan Lu
@R Leave it to non java fans to be egotistical about their language choice? This is a traditional question. My favorite (hacky) answer is still: for (int i = 0; i < array.length; i++) { if (array[i] < 0) { return array[i]; // Dupe, we already negated this one } array[i] = -array[i];}Voila. Constant space, linear time.
James
@James: nope, that's not constant space. It's `O(n)` space because you're assuming your input buffer has extra unused writable bits that you can use as temporary space. Whatever the size of your data type is, I can make `n` large enough that you don't have an extra bit.
R..
@R Damn you and your logic! I accept your argument, but only if you bother to define the original array as unsigned, which most of the people asking don't think to do in advance (hence it generally sufficing, since they're giving N between 0 and some number :)
James
Here's where I'm supposed to make another cheap stab at Java for not having unsigned types.... ;-)
R..
+1  A: 

Pick two distinct random indexes. If the array values at those indexes are the same, return true.

This operates in constant time. As a bonus, you get the right answer with probability 2/n * 1/(n-1).

sje397
You mean `2/(n*(n-1))`. Better than you thought!
aschepler
@aschepler - ah yes, you're right :)
sje397
How is random constant? The number of expected lookups into the array increases with the array size, which is also known as `O(n)`
matt b
I'm only picking 2 indexes, once. It doesn't usually work - that's the point. An O(1) algorithm that always works is impossible.
sje397
ah, I was assuming you kept trying until you returned true (or checked every index)
matt b
A: 

Build a lookup table. Lookup. Done.

Non-temporary array solution:

Build lookup into gate array hardware, invoke.

Hogan
Wouldn't that involve a temporary array?
sje397
Huge temporary storage for sure... but it is constant size!
Hogan
+2  A: 

XOR all the elements together, then XOR the result with XOR([0..n-1]).

This gives you missing XOR repeat; since missing!=repeat, at least one bit is set in missing XOR repeat.

Pick one of those set bits. Iterate over all the elements again, and only XOR elements with that bit set. Then iterate from 1 to n-1 and XOR those numbers that have that bit set.

Now, the value is either the repeated value or the missing value. Scan the elements for that value. If you find it, it's the repeated element. Otherwise, it's the missing value so XOR it with missing XOR repeat.

MSN
Another `O(n log n)` approach, but completely different from mine. :-)
R..
@R.. Are you sure this isn't linear complexity?
Mark B
Actually I think you're right. I read it as an iterative approach where you peel off a bit at a time, but indeed there are only 2 passes over the array and 2 passes over the values `1` to `n-1`. So I think this is the answer!
R..
The last pass can be eliminated. When you are making the first pass, calculate the sum `S` (per Vlad) together with your XOR. The sign of the `S - SUM(0...n-1)` will tell you whether the missing is greater or smaller than the repeated. When you are making the second pass, you can also independently XOR the numbers that has that bit *not set* (and the same for numbers in `0...n-1` range). That way you will determine the *second* number as well. I.e. now you know both missing and repeated. Since you know whether the repeated is greater or smaller, you can tell which one is repeated right away.
AndreyT
@AndreyT: I like your first suggestion, but the second suggestion seems like a waste of time in the loop. It's easier just to do a single XOR operation at the end to get the other value.
R..
@R..: Yes, you are right. Since we know `missing XOR repeat` and we also know one of them (either `missing` or `repeat`), then a simple XOR will give us the other. No need for a XOR in the loop.
AndreyT
+1  A: 

O(n) without the temp array.

a[]={1,0,0,2,3};
i=0;
int required;
while(i<n)
{
  a[a[i] % n] += n;
  if(a[a[i] % n] >= 2 * n)
    required = a[i] % n;
}
print required;

(Assuming of course that n < MAX_INT - 2n)

sje397
I consider this temporary space.
R..
@R, how do you consider this to use temp space?
AShelly
A: 

The best I can do is O(n log n) in time and O(1) in space:

The basic idea is to perform a binary search of the values 0 through n-1, passing over the whole array of n elements at each step.

  1. Initially, let i=0, j=n-1 and k=(i+j)/2.
  2. On each run through the array, sum the elements whose values are in the range i to k, and count the number of elements in this range.
  3. If the sum is equal to (k-i)*(k-i+1)/2 + i*(k-i+1), then the range i through k has neither the duplicate nor the omitted value. If the count of elements is less than k-i+1, then the range has the omitted value but not the duplicate. In either case, replace i by k+1 and k by the new value of (i+j)/2.
  4. Else, replace j by k and k by the new value of (i+j)/2.
  5. If i!=j, goto 2.

The algorithm terminates with i==j and both equal to the duplicate element.

(Note: I edited this to simplify it. The old version could have found either the duplicate or the omitted element, and had to use Vlad's difference trick to find the duplicate if the initial search turned up the omitted value instead.)

R..
A: 

Based on @sje's answer. Worst case is 2 passes through the array, no additional storage, non destructive.

O(n) without the temp array.

a[]={1,0,0,2,3};
i=0;
int required;
while (a[a[i] % n] < n)    
   a[a[i++] % n] += n;

required = a[i] % n;
while (i-->0)
   a[a[i]%n]-=n;

print required;

(Assuming of course that n < MAX_INT/2)

AShelly