views:

2242

answers:

12

I recently came across a question somewhere:

Suppose you have an array of 1001 integers. The integers are in random order, but you know each of the integers is between 1 and 1000 (inclusive). In addition, each number appears only once in the array, except for one number, which occurs twice. Assume that you can access each element of the array only once. Describe an algorithm to find the repeated number. If you used auxiliary storage in your algorithm, can you find an algorithm that does not require it?

What I am interested in to know is the second part, i.e., without using auxiliary storage. Do you have any idea?

+55  A: 

Just add them all up, and subtract the total you would expect if only 1001 numbers were used from that.

Eg:

Input: 1,2,3,2,4 => 12
Expected: 1,2,3,4 => 10

Input - Expected => 2
leppie
Wouldn't that require auxiliary storage?
Brian Rasmussen
@Brian Rasmussen: where is the extra storage?
leppie
@leppie: To hold the calculated sum, but honestly I don't know exactly what the OP meant by extra storage. In any case, I like your answer.
Brian Rasmussen
@Brian, the interviewer probably meant "don't use a hash table or an array"... I'm pretty sure O(1) storage, especially a single variable, would be satisfactory.
Michael Aaron Safyan
I'm guessing auxilliary storage would be storage that takes more than O(1) space.
Justin Ardini
+1, although your example is the wrong way around.
Aistina
@Michael: Thanks, that makes sense. Please disregard by question then.
Brian Rasmussen
the methord works perfectly fine. but the example should have been something like ( 1,3,2,4,2=>12 ) - (1+2+3+4 => 10) = 2
SysAdmin
This solution does not scale for bigger arrays. :-)
Franci Penov
@Franci Penov: I am not sure interview questions are supposed to scale :)
Brian Rasmussen
"I am not sure interview questions are supposed to scale"...but I am almost certain it would be the next question.
SysAdmin
@Franci Penov: Scale in terms of what? I sure hope you didn't mean overflow...
leppie
@Aistina P
leppie
@leppie - scale in terms of array size that can be safely processed without overflow. :-)
Franci Penov
@Franci Penov: It's an algorithm, the size of integers and arrays are unbound. Most languages could deal with big ints, the array size might be a problem, but you would have a problem before you even start ;P
leppie
sum-based algorithm requires O(log(n)) additional storage to hold the sum that is worse than O(1) for xor-based solution.
J.F. Sebastian
+15  A: 

Add all the numbers together. The final sum will be the 1+2+...+1000+duplicate number.

Laurynas Biveinis
+4  A: 

Add all numbers. The sum of integers 1..1000 is (1000*1001)/2. The difference from what you get is your number.

kgiannakakis
+3  A: 

If you know that we have the exact numbers 1-1000, you can add up the results and subtract 500500 (sum(1, 1000)) from the total. This will give the repeated number because sum(array) = sum(1, 1000) + repeated number.

Justin Ardini
+1  A: 

Well, there is a very simple way to do this... each of the numbers between 1 and 1000 occurs exactly once except for the number that is repeated.... so, the sum from 1....1000 is 500500. So, the algorithm is:

sum = 0
for each element of the array:
   sum += that element of the array
number_that_occurred_twice = sum - 500500
Michael Aaron Safyan
+1  A: 

No extra storage requirement (apart from loop variable).

int length = (sizeof array)/(sizeof array[0]);
for(int i=1; i < length; i++){
   array[0] += array[i];
}

printf("Answer : %d\n", array[0] - (length*(length+1))/2);
N 1.1
You are assuming the array is sorted. Bad assumption.
leppie
@leppie: how come? I didn't assume anything. And it actually uses any extra space like other answers suggest.
N 1.1
How is he assuming that?
Dennis Zickefoose
Although I do have issue with the premise. It requires an extra two ints.
Dennis Zickefoose
@Dennis: well loop variable has to be there, and `length` to make it generic.
N 1.1
@nvl: Even with academic exercises like this, a destructive algorithm that saves a single variable isn't particularly beneficial.
Dennis Zickefoose
+1  A: 

Do arguments and callstacks count as auxiliary storage?

int sumRemaining(int* remaining, int count) {
    if (!count) {
        return 0;
    }
    return remaining[0] + sumRemaining(remaining + 1, count - 1);
}

 

printf("duplicate is %d", sumRemaining(array, 1001) - 500500);

Edit: tail call version

int sumRemaining(int* remaining, int count, int sumSoFar) {
    if (!count) {
        return sumSoFar;
    }
    return sumRemaining(remaining + 1, count - 1, sumSoFar + remaining[0]);
}
printf("duplicate is %d", sumRemaining(array, 1001, 0) - 500500);
cobbal
This requires linear stack space, so that is definitely cheating.
Dennis Zickefoose
Throw in another argument and you can tail call optimize it.
cobbal
+42  A: 

Update 2: Some people think that using XOR to find the duplicate number is a hack or trick. To which my official response is: "I am not looking for a duplicate number, I am looking for a duplicate pattern in an array of bit sets. And XOR is definitely suited better than ADD to manipulate bit sets". :-)

Update: Just for fun before I go to bed, here's "one-line" alternative solution that requires zero additional storage (not even a loop counter), touches each array element only once, is non-destructive and does not scale at all :-)

printf("Answer : %d\n",
           array[0] ^
           array[1] ^
           array[2] ^
           // continue typing...
           array[999] ^
           array[1000] ^
           1 ^
           2 ^
           // continue typing...
           999^
           1000
      );

Note that the compiler will actually calculate the second half of that expression at compile time, so the "algorithm" will execute in exactly 1002 operations.

And if the array element values are know at compile time as well, the compiler will optimize the whole statement to a constant. :-)

Original solution: Which does not meet the strict requirements of the questions, even though it works to find the correct answer. It uses one additional integer to keep the loop counter, and it accesses each array element three times - twice to read it and write it at the current iteration and once to read it for the next iteration.

Well, you need at least one additional variable (or a CPU register) to store the index of the current element as you go through the array.

Aside from that one though, here's a destructive algorithm that can safely scale for any N up to MAX_INT.

for (int i = 1; i < 1001; i++)
{
   array[i] = array[i] ^ array[i-1] ^ i;
}

printf("Answer : %d\n", array[1000]);

I will leave the exercise of figuring out why this works to you, with a simple hint :-):

a ^ a = 0
0 ^ a = a
Franci Penov
+1 for leetness
larsm
a very neat solution.. I must say
SysAdmin
A non-destructive method would be to maintain an accumulator on the side... would also make it more readable I think.
Matthieu M.
@Matthiey M. - but a non-destructive solution would require additional storage and thus violate the requirements of the problem.
Franci Penov
+1, Absolutely Brilliant! :-)
missingfaktor
And your solution requires additional storage to hold the loop counter, plus possibly some intermediate values depending on how all that gets compiled. However, chosing a destructive algorithm does prevent overflow issues without introducing a separate type to hold the sum, which gives it an actual purpose.
Dennis Zickefoose
@Dennis Zickefoose - I am not arguing that a non-destructive solution with an additional integer variable is not better. :-) but it does violate the problem requirement, that's why I choose the destructive algorithm. as for the loop counter - there's no way to avoid this one, and it is implicitly allowed because the problem states that the code is allowed to iterate over the array once, which is not possible without loop counter.
Franci Penov
Nice answer +1 but now my brain hurts!
leppie
When you look at XOR of 1 to 1000, XOR of the whole array and compare this procedure with adding 1 to 1000 and comparing with the sum of the whole array you notice similarity - addition and XOR operation are related in binary. The only difference that XOR is more natural here because we know that the final difference can not be bigger then 1000.
Unreason
Not that this isn't a brilliant solution, but doesn't your first solution, the loop, access each member of the array twice?
Robert Gowland
@Robert - technically, it does. Hence the second solution. :-)
Franci Penov
-1. `xor` tricks make me sick
Pavel Shved
It still have additional storage in the form of temporary values, no ?
fa.
@Pavel Shved - there's no trick to XOR, it's a mathematical operation with well known properties, just as addition, multiplication and others.
Franci Penov
@fa - I don't see any temporary values. For all I know, the compiler I've been given targets a CPU that has special instruction that does XOR on 1002 values at once. :-)
Franci Penov
@Franci, call it whichever you like; if you rename what makes me sick, it won't get any better :-)
Pavel Shved
@Pavel - so you must get sick of computers in general then? cause without XOR, nothing really works... :-)
Franci Penov
@Pavel - also, you and I look at the problem in different ways - for I am not searching for duplicate number, I am searching for a duplicate pattern in sets of flags. when you state the problem in this way, using addition now becomes the "dirty trick" :-)
Franci Penov
@Franci: Wow, the stuff about flags makes sense. Perhaps, you might want to edit it into your answer. Then I can change the vote :-)
Pavel Shved
+5  A: 

To paraphrase Francis Penov's solution.

The (usual) problem is: given an array of integers of arbitrary length that contain only elements repeated an even times of times except for one value which is repeated an odd times of times, find out this value.

The solution is:

acc = 0
for i in array: acc = acc ^ i

Your current problem is an adaptation. The trick is that you are to find the element that is repeated twice so you need to adapt solution to compensate for this quirk.

acc = 0
for i in len(array): acc = acc ^ i ^ array[i]

Which is what Francis' solution does in the end, although it destroys the whole array (by the way, it could only destroy the first or last element...)

But since you need extra-storage for the index, I think you'll be forgiven if you also use an extra integer... The restriction is most probably because they want to prevent you from using an array.

It would have been phrased more accurately if they had required O(1) space (1000 can be seen as N since it's arbitrary here).

Matthieu M.
I've posted Python one-liner based on your answer http://stackoverflow.com/questions/2605766/how-to-find-a-duplicate-element-in-an-array-of-shuffled-consecutive-integers/2612318#2612318
J.F. Sebastian
+4  A: 

A non destructive version of solution by Franci Penov.

This can be done by making use of the XOR operator.

Lets say we have an array of size 5: 4, 3, 1, 2, 2
Which are at the index:                        0, 1, 2, 3, 4

Now do an XOR of all the elements and all the indices. We get 2, which is the duplicate element. This happens because, 0 plays no role in the XORing. The remaining n-1 indices pair with same n-1 elements in the array and the only unpaired element in the array will be the duplicate.

int i;
int dupe = 0;
for(i=0;i<N;i++) {
 dupe = dupe ^ arr[i] ^ i;
}
// dupe has the duplicate.

The best feature of this solution is that it does not suffer from overflow problems that is seen in the addition based solution.

Since this is an interview question, it would be best to start with the addition based solution, identify the overflow limitation and then give the XOR based solution :)

This makes use of an additional variable so does not meet the requirements in the question completely.

codaddict
+1  A: 

One line solution in Python

arr = [1,3,2,4,2]
print reduce(lambda acc, (i, x): acc ^ i ^ x, enumerate(arr), 0)
# -> 2

Explanation on why it works is in @Matthieu M.'s answer.

J.F. Sebastian
+1, well done: even though it's not a code golf, using python's built-in loops is faster :)
Matthieu M.
A: 

A triangle number T(n) is the sum of the n natural numbers from 1 to n. It can be represented as n(n+1)/2. Thus, knowing that among given 1001 natural numbers, one and only one number is duplicated, you can easily sum all given numbers and subtract T(1000). The result will contain this duplicate.

For a triangular number T(n), if n is any power of 10, there is also beautiful method finding this T(n), based on base-10 representation:

n = 1000
s = sum(GivenList)
r = str(n/2)
duplicate = int( r + r ) - s
psihodelia