views:

3938

answers:

34

Possible Duplicate:
Algorithm: How to tell if an array is a permutation in O(n)?

I was asked this interview question at Microsoft, and it's been bugging me that I haven't been able to solve it. Does anybody have any ideas?

You're passed an array with n integers in it. Each integer is between 1 and n (inclusive). These numbers are not in sorted order and there may be duplicates or missing numbers.

The problem is to figure out if there are any duplicates in the array in O(n) time and O(1) space without destroying the array.

A: 

You could use a bitmask variable v=0, and set, for each number found i,

v |= 1<<(i-1)

At the end, check if

v == (1<<n)-1

This works for n<64 for a long v, and can be extended easily for a greater n

This isn't really O(1) space, but neither n(n+1)/2 is!!

djjeck
actually I don't like this answer, but it's better than nothing =)
djjeck
This solution doesn't work because it's O(n) space.
Jessica
That's true, but the assumption that `n<64` seems somehow more limiting than that `n<2**64`
sepp2k
agreed, not a valid solution =)
djjeck
+10  A: 
stakx
It would take O(nk) to sort with radix sort, but here, k is equal to log(n). I suspect you're right about a solution not existing, but I'm not 100% sure. It is possible if there is only 1 number missing (duplicated) or really any finite number of numbers missing (duplicated), so it's not outside the realm of possibility that there is a solution.
Jessica
To the downvoter, I'd appreciate it if you would state *why* you think the answer wrong, or unhelpful.
stakx
I guess the downvoters don't see it as a proof that there is no answer. For example, the TV program "Myth Buster"... they try to do it and fail, and they say they bust the myth. So if they try to build an airplane and fail, they will say, "we bust the myth that any man made machine can fly in the sky"?
動靜能量
@動靜能量, hehe! I see your point. Still, the question asked was, "Does anybody have any ideas?" This can go both ways. To me it seemed more promising to first reflect about whether the problem can be solved at all, before I'd start thinking about hundreds of potential solutions. But of course my answer isn't thorough enough to prove that no solution exists. It's only a tentative start in that direction.
stakx
Marcelo Cantos's solution shows that it can be solved with arbitrarily high success probability. And it's easy to get the probability beyond the error rate of any physical CPU, so it is solved in practice.
CodeInChaos
What is arbitrarily high? As compare to n being what? What if n is 2^10000000? Then the chance of success will be not so high. You have to understand that in math, limit of x / y with both x and y being able to approach infinity is indeterminate. You can't say x can be arbitrarily high and therefore x / y must be zero. Then you can say, how can n be so high, there is no computer that can handle it, well, then what is n high enough for it to be practical? Is that true 10 years later, or 20? So your answer may be practical, but is not pure computer science.
動靜能量
Checking whether the sum is n(n+1)/2 doesn't pass the O(1) space requirement,as it takes O(log n) space to store the result of the arithmetic.
uckelman
A: 

You can probably try this code: (this is python)

integers = [ 1, 4, 5, 3, 2 ]

sum = 0

for i in integers:
    sum += 2 ** (i - 1)

if sum != 2 ** (len(integers)) - 1:
    print 'Duplicate'
else:
    print 'Perfetto'

Which only runs through len(integers) O(n) and uses sum for space O(1)

Antoine Pelisse
The memory used is not `O(1)` and the time isn't `O(n)` either as exponentiation is not `O(1)`.
IVlad
This is O(n) space. You're just hiding it by using bignums.
Marcelo Cantos
This is the same as [djjeck’s proposal](http://stackoverflow.com/questions/4058172/tricky-algorithm-interview-question/4058238#4058238): A bit map.
Gumbo
Yes it is, then it's O(lg n) in space, and it's still O(n) in time as exponentiation IS O(1) as long as you count in binary. Then it's probably not possible to use O(1) in space ...
Antoine Pelisse
-1 for stating exponentiation is O(1). It's only O(1) if the base is 2 for the fundamental data types. If `i` is `1000`, you're not going to be able to calculate `2^1000` in `O(1)`, nor add them all together in `O(1)`.
IVlad
This is probably not the same order of magnitude and can't probably be counted in a loop: Using your reasonment, it would be virtually impossible to have a O(n) algorithm, as addition in a loop would make it bigger than O(n)
Antoine Pelisse
Maybe we need to reconsider the first question: It's probably, how can you make sure there is no duplicate, reading all the integers only once and not saving all read values. It's probably not, is there a O(1) algorithm to sum/multiply/whatever operation you want
Antoine Pelisse
@Antoine Pelisse - you can assume that addition of two 32-bit integers is `O(1)` as per the OP's clarifications. What you cannot assume is that addition of `n` 32-bit integers **without overflow** is `O(n)`, because you might have to implement your own data type to store the result, since it might be too big for the fundamental types. You don't have to worry about saving the values yourself obviously, you need to worry about writing a function that **is fed the values** and **computes the answer in O(n)**.
IVlad
@IVlad: The addition of 'n' instegers which are of a size large enough to hold 'n' takes time O(n), since the resulting total can be at most N^2 and will thus fit in an integer twice as big as the original numbers.
supercat
@supercat - yes, but he is adding powers of two, which stop fitting in any 32 bit integer when the exponent reaches 32.
IVlad
@IVlad: Your statement was about adding 'N' 32-bit integers. If the integers are size 'N', then that will clearly become O(N^2).
supercat
+21  A: 

I found one solution, but it borders on cheating. Technically I don't destroy the array if I restore it afterwards to the original state. But of course it's bad style and fails in multithreaded code. And it requires MaxInt>=2*n+1.

I abuse one bit of each of the ints in the array as my n-bit bitvector to encode if I already encountered the number n. And when I'm done I strip them out again.

//Empty last bit
for(int i=0;i<n;i++)
{
   a[i]*=2;
}

//Fill last bit with my data
for(int i=0;i<n;i++)
{
   int num=a[i]/2-1;
   a[num]|=1;
}

bool result=false; 
//Restore the array and check for missing number
for(int i=0;i<n;i++)
{
   if(a[i]&1!=0)
     result=true;
   a[i]/=2;
}
return result;
CodeInChaos
You loop through the array 3 times. Isn't that `O(n*3)` time?
Core Xii
Check the definition of the big O notation. O(n)==O(n*3). http://en.wikipedia.org/wiki/Big_O_notation
CodeInChaos
What if `a[i] *= 2` overflows?
IVlad
If n is doubled then the execution time doubles, hence it's in O(n). Unfortunately this solution is also O(n) in space...
SimonJ
@IVlad overflow would be an issue, but the answer states the requirement that MaxInt>=2*n+1, so as far as interview questions go he's covered it:)
Steve Haigh
@IVlad That's why I added the requirement of MaxInt>=2*n+1. But yes technically it's still O(n) in space. And if one is really precise the input data is O(n*log(n)) in size and thus *any* algorithm is at least O(n*log(n)) in time.
CodeInChaos
Returns `true` for both `a = {1, 2, 3, 4}` and `a = {1, 2, 3, 3}`, `n = 4` for me in C++.
IVlad
It's cheating but very clever cheating :)
tauran
I agree with tauran. This is clever cheating.
Jessica
It is plain cheating... if you are using an array of "flags", then you may as well call it space requirement of O(n). What is the difference between simply using an array of flags and "hacking it" to use an array of flags? And for any problem that requires space of O(n), you can hack it to use an extra 32-bit too... or extra 64-bit by assuming a "bigger" integer size, and that is totally non-O(1).
動靜能量
+1 for creativeness, cheating or not
smirkingman
@Core Xii: 3*n main operations it's O(n)
dfens
I would confirm with the interviewer that it is an array of 'integers' and not 'unsigned integers' and then would use negative sign as the flag, rather then multiplying by two
Michael
This partial answer may please the interviewer for creativity.
SHiNKiROU
Isn't this O(n) in memory?
Green Code
+10  A: 

This paper describes a solution to a similar problem under the same constraints. It's pretty advanced but seems to be what you're after.

The only difference is that there's always a duplicate in that problem, whereas in your case there might not be. So you just need to figure out how to detect that.

IVlad
That's an interesting paper, but it doesn't directly apply since their technique to find an indirection list of infinite period in constant time does not work in this problem.
Jessica
Unfortunately only a straightforward O(N²)/O(1) solution comes out of the paper for this problem, which isn't asymptotically faster than the simpler O(1) space solutions.
Nabb
@IVlad: If you have n numbers and each number is 1..n either you have 1..n exactly or you have at least one missing number AND at least one duplicate.
Hightechrider
+30  A: 

For a probabilistic solution, compute m different order-independent checksums and verify that they all correspond to the known checksums for an array containing every number once.

To choose checksums, you could sum, multiply, sum the squares, multiply the squares, etc., allowing each of them to wrap rather than using bignums. There's probably some way to synthesise an arbitrarily large family of checksums. For instance checksum q could sum the result of multiplying each element by the (1000+q)th prime number.

Crude analysis: Assuming 32-bit checksums with passable avalanche and reasonable independence between checksums, the probability of a collision is about 1/(232m). This means that a five-checksum verification is on par with sha1.

Edit: Note @Steve Jessup's point in the comments that simple multiplication with different multipliers won't produce independent checksums. @CodeInChaos's suggestion to use sha1 on each character would.

Marcelo Cantos
That's a nice way to think about it.
Jessica
One implementation of this which even works if the array is chosen by an attacker who tries to fool your result is *xor*ing the sha1(a[i]+salt) of all entries with a random salt.
CodeInChaos
@CodeInChaos: Hehe, yes I guess that qualifies. O(100*n) == O(n).
Marcelo Cantos
There is one problem: the number of checksums you'll check will grow with `n`, so technically the complexity will be slightly larger than O(n).
ruslik
@ruslik: Fix *m* to some constant you are comfortable with. I'd be happy with *m* = 5 for any size of input.
Marcelo Cantos
"For instance checksum q could sum the result of multiplying each element by the (1000+q)th prime number" - those would be redundant, surely. If you just multiply by a constant, they'll either all match, or none of them will. However, if checksum `q` raised each element to the power of `q` modulo some large prime, then we could be in business. Ideal would a hash function which is cryptographically secure *except* for the property that it ignores the order of its input bytes. Not sure if such exists and/or can be easily derived from a normal hash as CodeInChaos implies.
Steve Jessop
@ruslik I don't think it grows with n. It only growth if you want do decrease the failure probability. The failure probability is just 1/2^m (m being the bitlength of the hash) and thus independent of n. and for m>100 it is small enough for all practical purposes.
CodeInChaos
@Steve: The choice of multipliers would be important, and my off-the-cuff suggestion is probably too low, but as long as they are sufficiently large to cause wrap-around, they should be fine (and cheaper than raising to some power). The basis of this idea is the [LCG](http://en.wikipedia.org/wiki/Linear_congruential_generator), so perhaps using five common LCG multipliers would give sufficient avalanche and independence.
Marcelo Cantos
Sorry, I don't understand the suggestion then. For n=3, (Kx + Ky + Kz) = (K*1 + K*2 + K*3) if and only if (x + y + z) = 6, regardless of K and regardless of wraparound (as long as K isn't a power of 2, of course, but you've excluded that already). I thought this was relevant to your suggestion.
Steve Jessop
@Steve: You are absolutely right. I'm forgetting my modulo arithmetic.
Marcelo Cantos
I voted for this answer anyway, btw, because it's the best so far and I think something along the lines of CodeInChaos's suggestion will probably provide a good set of checksums.
Steve Jessop
@CodeInChaos You're right. I see now.
ruslik
The probabilistic approach is interesting; if one has a set of independent bijective functions, one can compute the checksum of multiple such mappings, selected at random. If the time required for each such mapping is independent of 'N', one could reduce the probability of a false 'no duplicates' report to any particular desired level by using a fixed number of passes.
supercat
I don't get the "crude analysis": There are n^n possible sequences of n numbers in the range [1..n]. So, if I want to be reasonably sure that there's no collision between the checksums of [1..n] and the checksums of any sequence containing duplicates, then n^n * (1/(2^32m)) has be small. And that's only true if n << m. So the solution is at least O(n^2).
nikie
@nikie: This is standard hashing theory, which relies on the ability to have m << n; if you couldn't, then sha1 wouldn't be useful. You don't need to guarantee that there are no collisions at all, which would obviously require m >> n. You just need to ensure a low probability that a bad sequence coincidentally generates the same checksum as a known solution.
Marcelo Cantos
I don't like probabilistic solutions because it is just "waiting for it to break" -- it is like saying the chance of the algorithm failing is very small, but it does happen. Like the chance of winning the lottery is very small, but still some people win it.
動靜能量
@nikie: Minor correction on my previous comment. To guarantee no collisions you only need m == n, where the "hash" is just the sequence itself.
Marcelo Cantos
@動靜能量: In a universe built on quantum physics, there is no such thing as a non-probabilistic solution. Every possible test for equivalence has a non-zero probability of failure. Scientists and statisticians generally consider a probability of 10**-50 to be zero for all intents and purposes, and sha1 or something similar is on par with that. In practice, the probability of a false positive or negative is much higher than that, since CPUs fail, gamma rays from the sun flip bits randomly every once in a while, etc. The mathematics of cryptographic hashing is usually the least of your worries.
Marcelo Cantos
@ 動靜能量 The chances of random CPU errors is much higher than the chance of a hash collision for typical hash sizes(160-512 bits). So I wouldn't worry about that. It's something like unless there is an attacker who breaks the crypto behind the hash function it will take all the computers in the world until a collision happens at random in a 512 bit hash function.
CodeInChaos
@動靜能量: P.S.: For a bit of perspective on the numbers, the probability of winning a national lottery is about 10**38 times greater than that of seeing a sha1 collision between two different sequences. If you ran one billion sha1 comparisons per second on random pairs of documents for one billion years, you would still be 10 quadrillion times more likely to win the lottery than to see a collision.
Marcelo Cantos
@Marcelo Cantos it seems like the analysis is pure based on Sha1, but disregarding the size of `n`. What if `n` is really large, so large that it doubles the order of magnitude of 10^-50, or triple? Then the chance of failure is not as simple as 10^-50.
動靜能量
@動靜能量: There are two problems with this, either one fatal on its own. 1) How do you propose to create an array of size 10^100? There aren't that many atoms in the known universe, even if you include purported dark matter.
Marcelo Cantos
@動靜能量: 2) *n* is not the number of checksums you are comparing to each other, it is the size of the input to a single checksum, and the size of the input has absolutely no effect on the security of sha1. Perhaps @CodeInChaos's idea of computing a sha1 sum on each character confused you, but these sums aren't compared to each other, they are simply used to build a total checksum for the entire array. It is this total checksum that is compared to a known checksum to check that the array has no duplicates. Only one checksum comparison is made, with a 2^-160 (about 10^-48) probability of failure.
Marcelo Cantos
If the array size is about 2^(m/2) then we get collisions. So in theory we need O(log(n)) storage for the hash and O(n*log(n)) calculation. But in our finite universe this doesn't really matter.
CodeInChaos
A: 

This should do the job:

bool has_duplicates(const int * a, int n) {
      int expected = (n * (n + 1)) / 2; 
      int computed = 0;
      for (int i = 0; i < n; ++i) computed += a[i];
      return expected != computed;
    }

Hint: "array with n integers" and "integer is between 1 and n (inclusive)"

rados
There can be multiple duplicates in the array!
nikie
This won't work: e.g. n = 4: no dup 1+2+3+4 = 10, counterexample 1+1+4+4 = 10 also
Peer Stritzinger
no, fails on {2,2,2} expected =6, as is computed.
Paul
+1  A: 

I think rados was on to something. If in addition to the check that the sum adds up to n(n+1)/2, add a check to ensure that the product of the array is n! and if either check fails you have duplicates.

corriganjc
You can't compute `n(n+1)/2` and `n!` under the given constraints.
IVlad
n! takes O(n) bits to store.
Jessica
@Jessica: ... and n takes O(log n) bits to store, which is also beyond the O(1) space requirement.
William
+15  A: 

Looks like this will work:

Think of each array slot as a linked list node, containing the address of the next slot in the list. We calculate the address of next slot as (k-1), where k is the number stored in the slot, so if array[0] == 4, then next slot to visit is array[3]. An example array:

S = 0 1 2 3 4 (Array Idx)
k = 1 2 3 4 5 (Array Values)

The above array has 5 linked lists of one node each i.e. each list's last slot (the only slot of each list) pointing to the starting slot (i.e. itself). Another possible example:

S = 0 1 2 3 4 (Array Idx)
k = 2 1 3 5 4 (Array Values)

This array has 3 lists (2 1), (3), (5, 4) - two of them containing cycles ending at the starting node i.e. with the last slot in the list pointing to the starting slot of the list. Another example:

S = 0 1 2 3 4 (Array Idx)
k = 2 3 3 5 4 (Array Values)

This array has 3 lists as well, with one containing a cycle, that does not end at the starting node of that list, but at the last node itself: (2,3). There's another list starting in slot 1, which shares a slot with another list (3,3). And the last list is (5,4).



Our goal is to identify a list, whose last node does not point to itself, or the starting slot of that list. If such a list exists there is a duplicate in the array. We'll use this strategy to do this:

  1. Start traversing the first linked list starting at slot 0. Save the starting node idx of the "current" linked list in a variable S, so S = 0.

  2. As we visit each node, we will mark that node as visited by multiplying the number in the slot by -1.

  3. If a visited slot 'k-1' contains a number 'n' such that (n-1) != (k-1) && (n-1) != S, and n < 0, we are visiting the slot again, and we have found our duplicate number 'n'.

  4. However, if n-1 == k-1 || n-1 == S, the traversal of this list ends there and we need to find the next list to traverse.

  5. Store the index of current slot in S. To find the starting node of the next list - walk down the rest of the array in a circular fashion, until you find a slot which does not contain a -ve number, or you reach the slot pointed to by S again.

  6. If you reached the starting slot S, all the nodes in the array have been visited, and there are no duplicates.

  7. If you find a new slot, save its index in S, and go back to 2.

  8. After we are done we'll need to make one more pass through the array to restore the values (we are not supposed to destroy the array). This is still O(n) though.

Sudhanshu
O(n) in space (you're using the sign bit). Nice try.
SimonJ
@SimonJ: Not using extra space though apart from S. Just what's already there. :)
Sudhanshu
@Sudhanshu: Nothing says that the domain of values of the array extends from -N to N.
Nabb
@Nabb: I am not sure I get what you are saying. I am just transforming already stored numbers in the array to be -ve to mark them as already visited. That's possible because the question says the domain is between 1..N. And I restore the numbers in the array at the end, after finding the duplicate.
Sudhanshu
This is a very slick solution, and is probably as good as is possible.
Jessica
Using -1 as flag is possibile, and I think it's the key to obtain other solutions.
Fabio F.
Your step 2 is just the same cheat I do. You use the sign bit, I use the lowest bit. But your algorithm is more complicated without avoiding to cheat.
CodeInChaos
@CodeInChaos: If I had asked this question during the interview, I'd really not consider this as cheating. I don't think your approach is cheating too - except it has the disadvantage of reducing the range of the numbers. I really didn't look at your approach before using the sign bit as flag in my solution though - I saw this approach being used before in some other place (for something else). :)
Sudhanshu
You require the range of numbers from -n to n, and I from 1 to 2n+1. Personally I think both solutions are cheating, but since nobody here has found a clean solution yet this kind of cheating is probably the best **deterministic** solution one can expect in an interview.
CodeInChaos
@Sudhansu: it all depends whether "integer" in the question should be taken to mean `signed int`. What if "integer" is supposed to mean `unsigned int`, and n is supposed to be allowed to range all the way to UINT_MAX? The question doesn't say, so all you'd need is to clarify with the interviewer before continuing. Unfortunately we don't have the interviewer here to ask, so we talk about "cheating" and "not cheating" instead...
Steve Jessop
@CodeInChaos, @Steve: Yup, I agree with you guys - all depends on how you see it. I am happy with the top answers on this page too. I am really not looking to prove that someone's wrong - given that we don't have the interviewer here, and cannot determine what exactly what she wanted. It was satisfying to get to this answer - in the end its all that matters. :)
Sudhanshu
The problem may be trivially solved if there's a spare bit in each array element, since that's O(N) extra space. One could require that each array element be able to hold numbers from 1 to N+k for some constant k (the amount of extra space this would require for each element diminishes as the number of elements increases). I wonder if that would be useful.
supercat
I think you're on the right track. You can find duplicates in a linked list without modifying it, so you should be able to fix your solution.
Gabe
@Gabe: Yeah, you're right about the fact that I can find duplicates in a linked list without modifying it - however, since there can be multiple linked lists in the array - the problem boils down to whether you can keep track of which linked lists you have already looked at without modifying the list? That eludes me so far...
Sudhanshu
+5  A: 

Answered here

looks like I was on the right lines with the linked list idea, but I'm not sure I would have got there

Paul
The link here has some interesting ideas, but it does not answer the same question. He is answering the question of how to find duplicates in an array of size n containing numbers 1 to n-1.
Jessica
Looks like the same algorithm as what IVlad linked to, and doesn't solve the problem.
Nabb
This finds a duplicate value when it's known that the array contains at least one such duplicate. How would you adapt it to show the existence of a duplicate in the first place?
SimonJ
I think using floyd cycle finding algorithm might also lead to a solution to the question here ... some tweaks are needed though.
Peer Stritzinger
Its not a case, in this case you can't use pigeonhole principle because 1. you don't know any duplicate exist, 2. numbers are between `1..n` in pigeonhole principle should be between `1..k` were `k < n`.
SaeedAlg
+3  A: 

The paper that IVlad linked to inspired me to think of this solution (which unfortunately also cheats by destroying the array):

The idea is that you start at index 1. Then you look at the sequence 1,a[1], a[a[1]], a[a[a[1]]], and so on, and you look for when that sequence hits 1 again, or comes back to a number that it's already hit. If the latter occurs then you know that there is a duplicate. The cheating part comes in where you zero out every array element in this sequence. The reason for this is two-fold: (a) it makes it easy to determine if you hit a number twice in your sequence and (b) it makes the runtime O(n), since we mark this element as not needing to be iterated on when we go through the rest of the array.

Here is python code implementing the algorithm and some test cases, in case my description is not clear:

def arrayHasDuplicates(array):

    startLoopIndex = 1;


    while startLoopIndex < len(array):
        loopIndex = startLoopIndex
        while array[loopIndex-1] > 0:
            nextLoopIndex = array[loopIndex-1]
            array[loopIndex-1] = 0
            loopIndex = nextLoopIndex

        if  loopIndex != startLoopIndex:
            return True
        startLoopIndex = startLoopIndex + 1

    return False 

def test(array):
    print str(array)+" duplicates? "+str(arrayHasDuplicates(array))

test([1])
test([1,2])
test([2,2])
test([1,2,3])
test([2,2,2])
test([3,2,3])
test([1,1,1])
test([3,1,2])
test([7,5,3,1,2,4,6])
test([7,5,3,1,2,2,6])
Jessica
Getting closer ... this inspired me to my answer ...
Peer Stritzinger
There's no need to zero out elements to detect if you've hit a cycle somewhere other than your start point. Just let the iteration run 'n' times and if you haven't hit your start point, you've hit a cycle somewhere else. Since when that happens, you're done, that case will only happen once and it doesn't matter if it takes time 'n'. If one could compute the convolution of an array in-place in linear time, that might be good (since the convolution of a permutation would be reversible) but I don't see any nice way to do that.
supercat
You need to zero out elements to mark cycles you've already visited, otherwise the algorithm is O(n^2).
Jessica
A: 

(Update: per Jessica and Fabio's comment, need to check the number being less than or equal to n, and also the numbers add up to (1+n) * n / 2. I was assuming at first that they add up to that number, but just some number(s) are greater, some number(s) smaller, to compensate with each other, and with that condition, can they multiply all together to be exactly n!?)

(Update: the idea is, find a mathematical property, when f(1, 2, 3, ..., n) gives a number that cannot be duplicated with a different set of numbers. If f alone is not enough, then maybe f and g together, or maybe f, and g, and h together, but that's the idea of approaching this problem this way)

I am suspecting we can multiply all the elements, and check it against n! (n factorial).

Assuming the floating point is accurate no matter what order the numbers are multiplied.

Or we can use integer (bignum), but I think strictly speaking, then it is O(n log n).

This requires a mathematical proof, it is like, we expect a perfect occurance of 1, 2, 3, ..., n. If there is any deviation from it, they can't multiply to be exactly n!

Also, if we look at:

1 2 3 4 5 6 7 8 9

if we first make sure there is a medium 5, and the number left of it, 4, and right to it, 6 -- if any of those number doesn't exist, we know there must be a duplicate by pigeon hole principle. Now, we go through the unsorted array and calculate (a[i] - 4) ^ 2 and sum them up, as well as (a[i] - 6) ^ 2 and sum them up, and compare to what it is a perfect 1 2 3 4 5 6 7 8 9, since if we have the case 1 2 3 3 5 7 7 8 9, finding the deviation from 5 by (a[i] - 5) ^ 2 is not useful, if we shift the point where the deviation is comparing to, then they can't be equal. Also, we can use (a[i] - 4 ) ^ 3 instead of squared. It really depends on finding a math property where a perfect f(1, 2, 3, ..., n) cannot be exactly the same as f(n1, n2, n3, ... ), or if f alone is not enough, a combination of f and g that together ensures that when equal, only possible that the numbers must be from 1 to n, sequentially.

動靜能量
(1,1,6) breaks your algorithm.
Jessica
hm... with the check that no number can be greater than `n`
動靜能量
(1,2,3,4,5,6,7,8) = (1,2,3,3,5,8,7,8)
Fabio F.
sorry, forgot to add that, we already first check that they add up to `(1+n)*n / 2`
動靜能量
No, this doesn't work.
Jessica
Both [7, 5, 10, 4, 9, 9, 4, 4, 1, 2] and [1,2,3,4,5,6,7,8,9,10] sum to 55 and have a product of 10!.
Jessica
haha Jessica you are good at find cases where it breaks
動靜能量
The property exists and is that for each different set of number the sum of 2^a[i] is different.
Fabio F.
I'm not good at finding cases where it breaks .... my computer is.
Jessica
do you just "brute-force" all the numbers to see if it breaks? Then if `n` becomes 20 and it hasn't broken, then your algorithm can take very long time to run!? (20^20 is 1.05E+26)
動靜能量
+1  A: 

This wont work, see the comments ... just leaving it for discussions sake

Consider the following directed graph: n nodes (one for each array position) edges are from node k -> node a[k]:

Use Floyds Cycle finding algorithms to find a cycle and measure the length. If the length is equal to n there are no duplicates.

Only if the numbers are exactly the set 1..n the cycle length will be exactly n.

Cycle detection

Even better use Brent's algorithm to find the cycle length directly (Brents algorithm with Python example):

Brent's algorithm Richard P. Brent described an alternative cycle detection algorithm that, like the tortoise and hare algorithm, requires only two pointers into the sequence.2 However, it is based on a different principle: searching for the smallest power of two 2i that is larger than both λ and μ. For i = 0, 1, 2, etc., the algorithm compares x2i−1 with each subsequent sequence value up to the next power of two, stopping when it finds a match. It has two advantages compared to the tortoise and hare algorithm: it finds the correct length λ of the cycle directly, rather than needing to search for it in a subsequent stage, and its steps involve only one evaluation of ƒ rather than three.

Peer Stritzinger
what about (2,1,3). Doesn't this have two independent cycles of length 2 and 1? And I found no way to extend cycle based code quickly to multiple cycles.
CodeInChaos
rats you are right :-(
Peer Stritzinger
+5  A: 

Using Sudhanshu idea a possibile algorithm in Java could be.

boolean findDuplicate(int[] array){
    for(int i=0;i<array.length;i++){
        array[Math.abs(array[i])-1]*=-1;
        if(array[Math.abs(array[i])-1])>0 return false;//key already visited
    }
    for(int i =0;i<array.length;i++){
        //if all is fine all values are negative now
        array[i]*=-1;
    }
    return true;
}

Is it correct?

(Thanks to Maciej Hehl)

Fabio F.
You want array[array[i]] not array[array[i]-1]. and then (array[array[i]])>0 on the next line.And there is a potential tricky sign bit problem.
Ohmu
mm array={1} get overflow with array[array[0]] ...
Fabio F.
I'd change `array[array[i]-1]*=-1;` to `array[abs(array[i])-1]*=-1;` and the next line in a similar way.
Maciej Hehl
+8  A: 

The original question says:

figure out if there are any duplicates in the array in O(n) time and O(1) space without destroying the array

Does re-arranging the element count as "destroying the array"? For example, if those are just some collected data in some experiment where the order of performing the experiment is not important, then the order of those numbers doesn't matter.

In that case, we can take element 1, say, if it contains 8, then we swap element 1 with element 8, so now element 8 contains the number 8.

Now we look at element 1 again, if it contains 3, we swap it with element 3. If we are swapping into an element that already contains that number, then we have found the duplicate. If we reach the end of array without trying to swap the same number into a slot that already has that number, then we know there is no duplicate.

Just keep on swapping element 1 in this fashion until 1 is swapped into element 1, in which case we will move onto element 2.

It is like doing a Radix sort. It may look like it is O(n * n), but since each swapping will swap 1 more correct element into place, we can at most swap n times, plus the "moving along the array" which is also O(n), so it is O(n + n), which is still O(n).

動靜能量
Agree, worst case is something like [2, 3, 4, 5, 1] where every element needs swapping but once you've followed the cycle (bounded in size by the array length) you need only visit each element one more time. I suspect this is what's meant by "destroying", though...
SimonJ
I interpret rearranging the elements as destroying the array.
Don Kirkby
+3  A: 

[edited!]

The only problem seems to be that the question is not specifying whether these are signed or unsigned integers. If they are signed, we can use the sign bit for storage, as they are all guaranteed to be positive, and the solution is relatively trivial, and has been presented. For the sake of completeness, I present it once more:

//Solution involves the fact that we can use the sign-bit of 
//the i'th element to record whether the value i is duplicated.

// suppose array is P[0] thru P[N-1]
for (int i = 0; i < N; i++)
{
    int Pi = ABS(P[i]);

    if (P[Pi] < 0)
        break; // DUPLICATE

    P[Pi] = -P[Pi];
}

bool foundDuplicate = (i < N); // if i reached end, that means no duplicates

// now clean array
for (int i = 0; i < N; i++)
    if (P[i] < 0)
        P[i] = -P[i];

(NB: this method is not thread safe as it messes with the array before restoring it)

Note that zero duplicates implies that the target contains each of the numbers 1 thru N. it implies that there are no omissions. So we only need to check duplicates.

As it is an interview question, it would probably be bad practice to optimise it straight away, as it would just give the interviewer a headache, but it could be sped up tons.

If, however, we are given unsigned integers, then we are not guaranteed the sign bit to play with. in this case a solution is not obvious, and if it exists, it is almost certainly going to be impractical. My gut feeling is that this problem with unsigned integers is intractable, and that a proof of this could be arrived at, though it would be very tricky.

It is worth remembering that this is a question in the context of an interview, and the interviewer is likely looking for qualities such as the ability to work efficiently, rather than eagerness to waste hours worrying out theoretical minutiae. Business is usually more a matter of making gut calls; most problems in business do not have a perfect solution.

Also, a company hiring software engineers would want people with the ability to detect a flawed specification, rather than simply those that would blindly do what they have been told.

I know this is not going to be a popular angle, but being realistic, while an interview for a position in a university would appreciate a lengthy discussion, an interview for a position in a corporation would most likely put the emphasis on 'ability to get things done'.

I suspect also that the interview question assumes signed integers. It is a syntax convention of C, is it not, that an integer is assumed to be signed unless explicitly marked with the 'unsigned' keyword? I'm not sure if this is contained in the original C specification, but I would be surprised if it isn't.

Ohmu
You're the third one with this idea. But since you assume an integersize with one more bit than n needs to fit(the sign bit) this can be considered cheating. Check the solutions of me and Sudhanshu.
CodeInChaos
Of course the question doesn't specify whether the data type of N is the same as the datatype of P[]. I am guessing that this algorithm together with a brief exposition of the potential pitfalls would be enough to make the interviewer happy.
Ohmu
A signed 16 bit int typically has a range -32768 to 32767, so this is no problem.But nobody stated that they are signed. The only thing we know is that they support the interval from 1 to n.
CodeInChaos
I just deleted my comment which CodeInChaos was answering above, as it contained wrong information.
Ohmu
A: 

Interesting puzzle. If one assumes that array elements are sized to precisely fit numbers 1..N, then there will be a tiny bit of extra space in the array, but I'm not sure how one could really exploit it, especially since the number of array elements one has to munge to get each extra bit of storage will increase with N. Indeed, experiments with values of N up to 2^24 suggest that one will have to munge about 90.9% of the numbers to free up 'n' bits (this ratio is essentially the same with n=2^16 or n=2^24). If one could free up 'n' bits in linear time, of course, the problem would readily be solved from there on out, but I don't see any nice way of subdividing the problem which wouldn't result in NlgN or worse time.

supercat
A: 

Use a vector as a set to record each integer seen as you iterate through the array. The set will have constant (O(1)) size: one entry for each possible integer so the maximum number of entries is always MAX_INT. Only one iteration through the array is needed, so O(n) provided that the vector operations are constant.

someuser
and for n>MAX_INT it fails...
CodeInChaos
Did you not know that the number of integers is infinite? Your solution is fine for 32 bit numbers, but if we say the numbers are 64 bit, you'll need 2^56 bytes for your memory. Good luck with that.
JeremyP
Your solution is NOT O(1) space, it's O(n) space since as n goes up you need more bits in your vector.
Hightechrider
+2  A: 

A solution, assuming that:

  • we can modify the array as long as we put it back the way it was.
  • we can store numbers at least twice the size of n in each array element. Specifically, that there is at least one high binary digit available that will never be used for storing n.

The algorithm: Go over the elements of the array. At each element with value k, go to element k - 1 (assuming 0-based indexing) of the array, and set its high digit to 1. This gives us a bitmap of which elements are taken without destroying the original data.

Then, go over the array again, and check if there are any elements that do not have the high digit set. While doing so, clear the high digit to return the array to its original state.

If there are elements that do not have the high digit set, we know that there must be missing numbers, and hence that there must be duplicates.

So we end up having taken up a O(1) space (the boolean used to keep track of whether any element does not have its high bit set) and O(n) time (we loop over the array exactly 2n times).

Zarkonnen
You're the fourth one with that idea.
CodeInChaos
Check the comments on my solution for a discussion why it can be considered cheating.
CodeInChaos
Gah, yes, you're quite right. I did that thing where I thought I'd read the comments, and hadn't.
Zarkonnen
+1  A: 

Firstly, let me start off by saying that this is cheating. This is O(n) as O(n * SIZE_MAX), where SIZE_MAX is constant, is equivalent to O(n). And it uses O(1) space by allocating the same amount of space entirely independent of n.

This won't work on a real machine, as it is a horrible abuse of big-O notation.

int
test_dups (size_t *array, size_t n)
{
  int tmp[SIZE_MAX] = { 0 };
  for (int i = 0; i < n; i++)
    tmp[array[i]] += 1;

  for (int i = 0; i < SIZE_MAX; i++)
    if (tmp[array[i]] > 0)
      return 1;
  return 0;
}
Joe D
Yep, I noticed this as well. +1 for pointing it out. This is a bit like the philosophy interview: Q:"Prove this table exists." A:"What table?" as it moves the focus to the definition of the question. if the question is designed to be tricky, a tricky answer is an apt response.
Ohmu
+2  A: 

A solution without doing any sum or multiplication:

Credit to Paul as I got this idea from his comment: "One thing about the problem as stated is every index must present in the array. So if you consider the array as a set of linked lists where teh value is used as the next index, then the sum of the lengths of all of the linked lists must be N. it seems this should be possible in O(N) but I can't see how to do it yet...".

Algo:

int i = 0;
int j = 0;
while(true) {
    if (arr[i] > 0) {
        arr[i] = - arr[i];
    }
    else {
        printf("duplicate number at index %d\n", i);
        break;
    }
    i = -1 * arr[i]; // we just made a[i] negative!
    j++;
    if (j == n) {
        break;
    }
}
// Again run a loop to make each number positive if it had become negative.

Proof:

1) If j < n and we are able to detect a cycle then there is a duplicate.

2) If j = n and we still did not catch cycle, then we have gone through each index and so each number is unique.

2) From pigeon-hole principle, If at least one number is missing, then a number must be duplicate

Niraj Nawanit
Yet another variation of the cheater solution where you use one bit of each array element as memory.
CodeInChaos
A: 

If the numbers are unique they can be sorted in O(n). Assume the numbers are in an array a indexed 1..n. Go to element 1, if it is 1, go to the next element and continue this way until there is an element out of place, say m. Go to a[m]. If a[m] == m there is a duplicate, otherwise set n = a[m], a[m] = m, m = n, and continue. Eventually you will end up where you started. The starting element is now in place and you can continue as before. This will only fail to sort the array if there are duplicate elements and that will be detected in the process.

chuck
And by sorting the array inplace you destroy it. And this solution has already been posted.
CodeInChaos
It can't be sorted in O(n).
Jessica
A: 

here's something that came to mind

let s = 0
go through the array, and for every element a[i],
if a[i] > i, s++
if a[i] < i, s--
else do nothing

if you finish with s = 0, there are no duplicates.

my intuition is that if every number from 1 to n appears once, this sum will always be zero regardless of permutation, but I don't really know if this is true

unfortunately this requires O(logn) space even if it does work.

Bwmat
{2,3,1} gives 1 but has no duplicates. { 3,3,1,1} gives 0 but has duplicates.
Paul
(2,2,2) => s = 0
CodeInChaos
yeah I had a feeling it wouldn't work. Oh well.
Bwmat
+1  A: 

Basing on "http://www.wolframalpha.com/input/?i=Sum(e^(k/n*2*pi*i),1..n)"

if n is prime compute

sum=0
for k in 1..n: sum += exp(a[k]/n * 2*pi*i )

Now, it's enough to check if sum==0. (Here i = sqrt(-1) and computation is on complex plane).

Primality of n is important, because there is no way to make the sum be 0 when there are duplicates (no subset of n-th order roots of 1 on the unit circle sums to 0, except the whole set). It can be however pretty close to 0 in some cases of repeating numbers and the difficulty is to asses how close.

Edit:

The precision of floats needs to be high enough to see if sum == 0. I don't know yet how to bound number of bits needed for complex numbers with respect to n.

Could it be that the correct solution has something to do with Fast Fourier Transform?

macieksk
I'd guess the required precision for the number is O(n), which would make this algorithm violate both constraints (the math on such large numbers would be at least O(n) so total time at least O(n^2)
CodeInChaos
@CodeInChaos: hmm, numbers used here are of length 1 (they lie on a circle of radius 1), so any sum of such is < n... The question is, how much precision we need to really see if something is 0 or not.
macieksk
there are n^n-n! array states which must be recoginized as wrong. These would fill n*log2(n) bits. So I'd guess we need about that many fractional bits to encode the number
CodeInChaos
@CodeInChaos: ...most of those states n^n states, are pretty far from 0, aren't they? So, many of them can blend together (as it normally happens with floating point precision)... it seems that we only need enough precision around zero... then, the summation might introduce new errors... i don't know.
macieksk
Yes the bit count was only a rough estimation. But it still seems like you won't get below O(n) bits. And proving correctness and not just that it's likely correct is really hard. So I'd rather go with Marcelo's probabilistic solution since it's easy to estimate the chance of failure for it.
CodeInChaos
If FFT leads to the correct deterministic solution here (with O(log n) space), then, of course it's hard :) Marcelo's solution is really nice.
macieksk
...if not FFT then this should give the solution: http://en.wikipedia.org/wiki/Ramanujan's_sum
macieksk
+2  A: 

Its easy to find one duplicate: Start from first position in array, check it's value if is 1 move to next, but if is i > 1 move it to a[i] (save current a[i] value) now set the ist position value to a[1] + n and move previous a[i] value to it's appropriated home like a[1] (i.e a[i] == j move it to a[j] and do the same with a[j]) when you move 'a[i]' it to its ... and so on if you finding two n +i in a same home you find one duplicate, in the case where a[1] is in place (1st value in array) select a[2] and do the same, your running time algorithm is O(n) (because you visit each house at most twice) and the extra space you using is O(1). also you should skip items which are has a value of bigger than n in your iteration.

SaeedAlg
And how do you restore the array in the end? YOur code looks destructive to me.
CodeInChaos
@CodeInChaos, its easy just do reverse in O(n) because you have `n + i` so you know its exact place :D, I say just one duplicate when you find it you doing reverse.
SaeedAlg
Can you post that as code? Your description sounds like 動靜能量's solution, but his destroys the ordering of elements.
CodeInChaos
@CodeInChaos, Ok but i didn't find any similarity
SaeedAlg
Oh it's yet another of the cheater solutions where you use one bit on each array element as tag. Like I used the lowest bit while multiplying the original number with two, or like using the sign as tag.
CodeInChaos
@CodeInChaos, I try to wrote it but time is 2 am and I should to go to sleep, should wake of at 8 tomorrow, I should change `n + i` with `n * j + i` I'll wrote it tomorrow.
SaeedAlg
+1  A: 

This is in fact similar to CodeInChaos's solution: the original question states

You're passed an array with n integers in it.

without saying unsigned integers, so we can toggle the numbers to be negative:

for each elements in array, say if it contains 8, then go to element 8 and toggle the number to be negative. If it was already negative, that means it was toggled previously and therefore a duplicate is found.

otherwise, just go on, and if an element we reach has a negative number in it, just take the absolute number (-10 becomes 10 and go to element 10 to do the toggling)

If it can toggle every element without finding a value inside the array that we want to toggle but is already toggled, then we know there is no duplicate.

at the end, we just go through the array and toggle the elements back to positive numbers. (and O(n + n) is the same as O(n))

we can always negate the number, because such as for an 8-bit integer, the max is 127, and the min is -128, so 127 can always be toggled to be -127.

It is a bit like "using an extra bit", but make use of the fact that we can turn the number into a negative one.


if the given numbers are unsigned integers, we can still do it like this:

8 7 2 3 5 1 9 6 4

we use 2 pointers, one pointing at 8, one pointing at 1, can swap the numbers if the left one is greater than the median, and the right side is less than the median. If the number on either side is "already on the correct side", then just move the pointer one step forward and repeat the above. This will move all numbers to the left and right of the median in the correct group and is O(n).

1 4 2 3 5 8 9 6 7

now we have this array, we can do the toggling like the above by using 10 - n... so if we want to toggle 1, we put a 9 in it, if we want to toggle 6, we put a 3 in it. The idea is, we make it so that it is not supposed to be on the left side, or not supposed to be on the right side, to use as a flag.

So for example, the number 1, if it is "toggled", it will be 9. The number 9 is not supposed to be on the left side of the array, so we know that this number has be toggled. It uses the same method as the algorithm first described in this post, only the initial toggling is done by making it a negative number, but this toggling is done by using (n+1) - i

This method involves moving the elements, so don't know if it counts as "destroying the array". (or we can use my other method in this thread that involves rearranging the items in array as well)

動靜能量
A: 

If the array has no duplicates, the sum of the elements of the array will equal ((n+1)*n)/2.

Fred
However if the sum is `((n+1)*n/2` it does not follow that the array has no duplicates.
Don Roby
True, but if there are duplicates, the sum could also be n(n+1)/2. You can't use this alone to distinguish all the cases.
uckelman
For the record, Jessica mentioned long time ago [2,2,2] and [1,2,3] both add up to 6. or it can be [1,2,3,4,5] vs [1,1,3,5,5] or [1,2,3,4,5,6,7,8,9] vs [1,2,2,4,5,6,7,7,9]
動靜能量
A: 
n = 0
i = 0
retval = true
while true
  next_i = a[i]
  i = next_i
  a[i] = -a[i]
  n++
  if n == array.size then break
end
for i in 0..array.size do |i|
  if array[i] > 0
    retval = false
  else
    array[i] = -array[i]
  end
end

return retval

hope this works

dfens
A: 

Here's an interesting (though ultimately flawed) approach for algorithm connoisseurs:

  • Allocate a constant size buffer of length 100 integers
  • Loop through the entire input array for k=2 to 100, and create a histogram of (array[i] mod k) in your buffer
  • For each value of k, check that the histogram is correct (i.e. what you would expect for an array containing all the numbers 1 up to n)

This is O(n) time and O(1) space.

It's clearly not a perfect solution as it can fail if n is too large (it can't distinguish between 1 and 69720375229712477164533808935312303556801 for example).

It possibly may fail for some other very carefully arranged input. But I haven't yet been able to find a case with 64-bit integers where it doesn't work. Counterexamples welcome!

mikera
p.s. this algorithm was inspired by the Miller-Rabin primality test - in the sense that it is using lots of modulo tests to show that there are probably no duplicates in the array.
mikera
A: 

Edit: Below, I'm going to assume that the O(1) space requirement "really means" that we can't store more than a constant number of integers, not bits. Yes, I realize you can store anything encoded in an arbitrarily-large integer. Oh well :-) Note that even storing n is going to violate the space requirement by requiring O(log n) bits of storage.

[Assuming the array contains only elements from [n], as stated by the OP:]

The array is a permutation of [n] (that is, the array passes the OP's check) if and only if the sum of elements equals T_n and the product of elements equals n!.

The reasoning: Suppose the array A is a permutation of [n]. Now suppose we want to generate an array A' to "fool" the above-implied algorithm that checks sum(A) and product(A). Without loss of generality, we begin with a sum-preserving alteration. Take any two elements x and y of A. To preserve sum, we change them to x-k and y+k for some 0<k<min(x,n-y+1). Now, for this alteration to be product-preserving, it must be that xy=(x-k)(y+k). This simplifies to k=x-y. Finally, this implies that the sum-preserving alteration has not, in fact, altered anything; x and y have simply switched locations in the array. Thus, A remains a permutation.

Edit: The above reasoning will not work, as one can essentially perform another sum-preserving alteration to compensate for the product difference generated by the first one. Thanks go to the commenters for patiently getting me round to this.

William
Unfortunately, while computing the sum of all the elements may be done in "constant" space (meaning that one unit of storage is presumed to be capable of storing a number from 1 to N, no matter how big N happens to be) the space required to store N! is not going to be constant. It might be possible to subdivide things somehow, but I suspect one will either end up having to use a non-constant amount of space or non-linear time.
supercat
There are counter-examples to the hypothesis: "The array is a permutation of [n] (that is, the array passes the OP's check) if and only if the sum of elements equals T_n and the product of elements equals n!"
Jessica
n! is of similar size as n^n, and requires about n*log(n) bits or n integers.
CodeInChaos
@supercat, @CodeInChaos: very fair points about differentiating between "O(1) space" interpreted as being able to essentially store O(log n) bits, versus being able to store the O(n log n) size of n!.
William
@Jessica: I invite you to present the counter-examples that you claim exist.
William
@William: Consider the groups 3,4,12 or 2,8,9. Both total 19, and both multiply to 144. So replacing the numbers 3, 4, and 12 with duplicates of 2, 8, and 9 would not change the product nor the sum.
supercat
@supercat: I have to apologize here, since I made an unstated assumption that elements could only be from [n]. My hypothesis should be clarified; alas, this question's closure has disabled editing. My invitation stands for anyone to show counterexamples with elements only from [n].
William
I tried the edit again and it worked. My chrome is mis-behaving.
William
It seems impolite for me to comment so much, but I have to wonder at this point: Is anyone disagreeing with the "reasoning" paragraph? Or is this being skipped and are gut feelings the root of the "counter-examples exist" response?
William
@William - Both [7, 5, 10, 4, 9, 9, 4, 4, 1, 2] and [1,2,3,4,5,6,7,8,9,10] sum to 55 and have a product of 10!.
Jessica
As has been proved elsewhere in the thread, any finite number of check-sums can be defeated.
Jessica
A: 

What about checking duplicates THEN summing all the array elements? The problem is that how to check duplicates with O(n) time complexity? It sounds impossible, but there may be special mathematical properties with array of [1..n] which the interviewers may be wanting you to be a math contest winner. Or, they may be expecting you to mention about quantum computers.

My previous attempt, "If the array has no duplicates, then the sum is 1+2+...+n" failed, since sum [1,2,3,4] = sum [2,2,3,3].

This may help: http://aperiodic.net/phil/archives/Geekery/find-duplicate-elements.html

SHiNKiROU
How about [2, 2, 3, 3] ?
Maciej Hehl
+1  A: 

You're passed an array with n integers in it. Each integer is between 1 and n (inclusive). These numbers are not in sorted order and there may be duplicates or missing numbers.

The problem is to figure out if there are any duplicates in the array in O(n) time and O(1) space without destroying the array

This is my solution (in Java). It is O(n) time and O(1) space and it does not destroy the array.

How does it work and why is this correct?

  1. My solution has its range where it works, but that can not be reached - or actually could if JRE could address 8.59 GB on memory 32bit kernel or 12.9 GB memory on 64bit kernel (and arrays need sequential memory block :D).
  2. Assuming you can't do 1., then this enables me to use 1 bit in every number in the array, because by task definition values are between 1 and n. Also note that all values are positive.
  3. This is not sorting, because I know where the values will go in the array (like linear reordering). I'm not switching elements, but changing 1 bit in numbers.
  4. At all time 31 bits will remain same, and I'm not destroying any value that could be there. After I'm done, I will restore even that 1 bit, that I temporarily changed. This can be thread safe if threads read data with value & ((1 << 31)-1).

Code example:

// Finds if array contains at least 1 duplicate
// O(n); Space(1); does not break array
public static boolean isDup(int a[]) {
    int n = a.length, temp, mask = 1 << 31, i;
    boolean result = false;

    for (i = 0; i < n; i++) {
        if ((temp = a[i]) < 0)
            temp ^= mask;
        if (a[temp - 1] > 0)
            a[temp - 1] ^= mask;
        else {
            result = true;
            break;
        }
    }
    for (i = 0; i < n; i++) 
        if (a[i] < 0)
            a[i] ^= mask;
    return result;
}

In case for C or C++ you only need to change int n = a.length in the method body.

Pro bono publico, answer to similar question - if you would need to print out all the duplicates. Here I'm exploiting the fact that xor is fastest operation and 0 is false in C (this will not work in Java).

    // Finds all duplicates
    // O(n log n); Space(1); does not break array
    int i, j;
    for (i = 0; i < n - 1; i++)
        for (j = i + 1; j < n; j++)
            if (!(a[i] ^ a[j]))
                printf("Duplicate: %d : %d.", i, j);
Margus
http://en.wikipedia.org/wiki/X86-64
Margus
If you need one extra bit per integer, this yields O(n). There are many answers with your solution here.
dark_charlie
Extra bit is contained **inside existing array**, so no extra space is required, this is O(1). O(n) space would be, if I used external array for this. And at moment I posted this, there was no algorithm or example that was correct (and there is no atm. - you can not calculate checksums or add/multiply/divide anything).
Margus
A: 

Alright, this is different, but it works!

I ran this test program (C#):

    static void Main(string[] args) {
        //Run 100 tests, each test with an array the size of the test number
        for (int i = 1; i <= 100; i++) {
            int[] temp = new int[i];//array the size of the test number
            //Fill temp array with a permutation
            for (int j = 0; j < i; j++) {
                temp[j] = j + 1;
            }
            //Now that setup is done, test for a permutation
            bool check = IsPermutation(temp);
            Console.WriteLine("Permutation: " + check);
        }
    }

    static bool IsPermutation(int[] array) {
        int x = 0;
        for (int i = 0; i < array.Length; i++) { //O(n)
            //Because it's an XOR, it's commutative,
            //and doesn't depend on the order of the array
            x ^= array[i];
        }
        bool result = false;

        switch (array.Length % 4) {
            case 0:
                result = (x == array.Length);
                break;
            case 1:
                result = (x == 1);
                break;
            case 2:
                result = (x == array.Length + 1);
                break;
            case 3:
                result = (x == 0);
                break;
        }
        return result;
    }

So I put up my previous answer in a bit of a haste. My code wasn't very readable and didn't convey what I was doing. The actual algorithm is the IsPermutation method, everything else in the main method is setup for the IsPermutation method, therefore irrelevant for the O(n) requirement.

As for Jessica's { 4, 3, 2, 1 }:

First, XOR ALL elements, not just the first three. 4 ^ 3 = 7 ^ 2 = 5 ^ 1 = 4. So in my code, x is 4. Because we switch on array.Length % 4 == 0, we check the array.Length == x branch of the switch and it's true, so it's a permutation.

Corey Ogburn
This doesn't work.
Jessica