views:

10324

answers:

15

There is an array of size n (numbers are between 0->n-3) and only 2 numbers are repeated.

Elements are placed randomly in the array.

e.g. in {2, 3, 6, 1, 5, 4, 0, 3, 5} n=9, and repeated numbers are 3 and 5.

What is the best way to find the repeated numbers?

P.S. [You should not use sorting]

+1  A: 

Sorting the array would seem to be the best solution. A simple sort would then make the search trivial and would take a whole lot less time/space.

Otherwise, if you know the domain of the numbers, create an array with that many buckets in it and increment each as you go through the array. something like this:

int count [10];

for (int i = 0; i < arraylen; i++) {
    count[array[i]]++;
}

Then just search your array for any numbers greater than 1. Those are the items with duplicates. Only requires one pass across the original array and one pass across the count array.

Steve Rowe
His or her teacher put restrictions on sorting.
KingNestor
I agree it's probably just a homework question ... however, this can be a real problem if the elements are not comparable.
tjdonaldson
How would you have incomparable elements? You just need to force some ordering on them. Use whatever the criteria is for sameness to derive that ordering system.
Steve Rowe
given the requirement that there are only ever two repeated numbers it would be better to just return the value once a duplicate is found.
mezoid
Of course you can force an ordering on anything ... but how would you do that for elements that are not already pre-defined as comparable? That should be part of the solution. Also, it can be tough in practice to have small enough ranges to make counting occurrences feasible.
tjdonaldson
@tj the problem specifies they are numbers and that they are in the range of 0 to n-3. That would be comparable and small enough.
Steve Rowe
A: 

For each number: check if it exists in the rest of the array.

mookid8000
The running time on that would be horrible though. O(N^2). Probably needs a better solution.
Steve Rowe
Indeed. This is less efficient than sorting.
jrockway
But as sorting is not allowed, it's a reasonable solution. It's certainly more practical than using an array to hold a count for all elements in the domain.
tjdonaldson
And 'best way' may be different than what one might think :)
Aleris
This is what I get from answering a vague question, I guess :-) it's the best way if you consider the (humanly percieved, not algorithmic) complexity of the algorithm...
mookid8000
@tjdonaldson: is it the 'best' way when N = 100,000,000? Doesn't sound very reasonable to me.
MahlerFive
This solution is O(n^2). There is a solution with sorting that would be O(n log(n)). The best solution is O(n), and no, it doesn't need an array for counting.
Svante
It's funny that so many people consider performance the only quality metric - what about one's ability to quickly understand the algorithm and not wasting your brain cycles grokking some local optimization to a simple problem?
mookid8000
The optimal solution is "grad student - write a program to find repeated numbers in this array, have it ready for tomorrow"
Martin Beckett
This would also require an extra array, or else you would check the repeating numbers several times.
sindre j
+6  A: 

Insert each element into a set/hashtable, first checking if its are already in it.

tjdonaldson
If the total number of distinct values are small (less than 100) it really doesn't matter if it's a set or not. Searching through a list linearly will be many times faster.
John Leidegren
@John If the number is small it doesn't matter, because both approaches are fast. For large n a hashtable or tree implementation of a set is much better. Plus it is not good to choose a list for what conceptually is a set.
starblue
@starblue I would argue against that. My opinion is that linearity is always favorable over trees, hash tables and what not.
John Leidegren
Hashing as pointless, as you already know the exact domain, which is conveniently integers starting at 0.
recursive
I think hashtable are better than arrays, as you need not specify the size while creating them, which is not the case with arrays
Learner
A: 

Without sorting you're going to have a keep track of numbers you've already visited.

in psuedocode this would basically be (done this way so I'm not just giving you the answer):

for each number in the list
   if number not already in unique numbers list
      add it to the unique numbers list
   else
      return that number as it is a duplicate
   end if
end for each
mezoid
Isn't the running time of this still O(N^2)? The length of the unique numbers list will grow to nearly the length of the array and have to be searched each number. I suppose it could be a sorted list so O(N * LogN) might be possible.
Steve Rowe
If the list is replaced by a hash table or set, it's faster than O(n^2).
tjdonaldson
well, the best approach would be to sort the list first...but since that isn't an option we just want to find the first duplicate and return it. In the best case we will return very quickly...worse case we'll return after examining the entire list. I'm not sure how that can be avoided given no sort
mezoid
If you use proper storage for "unique number list", like a tree, then this too is O(N log N). If you spend more memory and use e.g. a bool[] indicating seen status, it even becomes O(N) in time but O(MAX_NUMBER) in space.
David Schmitt
@tjdonaldson Yes, if I were writing it in C# I'd definitely put it in a hash table. But since the language hasn't been specified I wrote it in psuedocode...that way if he's writing in Basic or Assembler etc he'll be able to figure something out...
mezoid
i don't understand, why this was voted down. Very normal, common, practical approach. Just do not treat "unique numbers list" too literally.
eugensk00
+14  A: 

There is a O(n) solution if you know what the possible domain of input is. For example if your input array contains numbers between 0 to 100, consider the following code.

bool flags[100];
for(int i = 0; i < 100; i++)
    flags[i] = false;

for(int i = 0; i < input_size; i++)
    if(flags[input_array[i]])
         return input_array[i];
    else       
        flags[input_array[i]] = true;

Of course there is the additional memory but this is the fastest.

Sesh
If the elements are integers or strings, which would be pretty common, then this approach is not going to work.
tjdonaldson
The question already specified the domain of the input, so this is perfectly acceptable.
Eclipse
A hash table can replace the array and it would work for any input then. As I mentioned, the downside is the additional memory requirement but speed wise it works.
Sesh
@sesh This is the best technique but it can be done even better - if the numbers range from 0-n then all you need are n BITS, not bytes or bools. It sounds pedantic, but often these type of questions will be phrased where n is in the billions.
Andrew Grant
Andrew - you are right. Only I was too lazy to type the bit operations - its crazy typing code in a text editor :)
Sesh
It is possible (with some constraint) to write this algorithm in O(1) memory. See http://stackoverflow.com/questions/555744/algorithm-to-find-two-repeated-numbers-in-an-array-without-sorting/556686#556686
J.F. Sebastian
+5  A: 

Check this old but good paper on the topic:

CMS
This solution is O(n*log(n)) in this case (due to there are only 2 duplicates in the array). Therefore it is not better than sorting in this case. The best solution should take into account that the number of duplicates is 2 and all values are in [0, n-3] range.
J.F. Sebastian
+6  A: 

You might be able to take advantage of the fact that sum(array) = (n-2)*(n-3)/2 + two missing numbers.

Edit: As others have noted, combined with the sum-of-squares, you can use this, I was just a little slow in figuring it out.

Eclipse
not if they are not consecutive numbers
Sesh
The question specified that the numbers are between 0 and n-3, plus the two repeated numbers, so every number between 0 and n-3 must be in the array.
Eclipse
even if i know the sum, then what??
Aman Jain
This doesn't work. -1.
Andrew Grant
The sum of the array elements includes one occurrence, so the formula would be:x = (n-2) * (n-3) / 2 - Sum of array elements.
dirkgently
If you combine the sum with the product, the result will be unique: http://stackoverflow.com/questions/555744/algorithm-to-find-two-repeated-numbers-in-an-array-without-sorting/556744#556744
sdfx
Yup - it was sitting there in the back of my mind, but I wasn't getting that last bit of information (I was stuck trying to figure out a solution using XOR - which I still think might be potentially faster than the sum-of-squares)
Eclipse
+14  A: 

OK, seems I just can't give it a rest :)

Simplest solution

int A[N] = {...};

int signed_1(n) { return n%2<1 ? +n : -n;  } // 0,-1,+2,-3,+4,-5,+6,-7,...
int signed_2(n) { return n%4<2 ? +n : -n;  } // 0,+1,-2,-3,+4,+5,-6,-7,...

long S1 = 0;  // or int64, or long long, or some user-defined class
long S2 = 0;  // so that it has enough bits to contain sum without overflow

for (int i=0; i<N-2; ++i)
{
   S1 += signed_1(A[i]) - signed_1(i);
   S2 += signed_2(A[i]) - signed_2(i);
} 

for (int i=N-2; i<N; ++i)
{
   S1 += signed_1(A[i]);
   S2 += signed_2(A[i]);
} 

S1 = abs(S1)
S2 = abs(S2)

p = (S1+S2)/2;
q = abs(S1-S2)/2;

One sum (S1 or S2) contains p and q with the same sign, the other sum - with opposite signs, all other members are eliminated.
S1 and S2 must have enough bits to accommodate sums, the algorithm does not stand for overflow because of abs().

Previous solution

I doubt somebody will ever encounter such a problem in the field ;)
and I guess, I know the teacher's expectation:

Lets take array {0,1,2,...,n-2,n-1},
The given one can be produced by replacing last two elements n-2 and n-1 with unknown p and q (less order)

so, the sum of elements will be (n-1)n/2 + p + q - (n-2) - (n-1)
the sum of squares (n-1)n(2n-1)/6 + p^2 + q^2 - (n-2)^2 - (n-1)^2

Simple math remains:

  (1)  p+q = S1  
  (2)  p^2+q^2 = S2

Surely you won't solve it as math classes teach to solve square equations.

First, calculate everything modulo 2^32, that is, allow for overflow.
Then check pairs {p,q}: {0, S1}, {1, S1-1} ... against expression (2) to find candidates (there might be more than 2 due to modulo and squaring)
And finally check found candidates if they really are present in array twice.

eugensk00
O(N) and only requires two integer variables to store the sums. Lovely.
Sol
Lovely, but maybe confusing. p+q != p^2+p^2.
Mart Oruaas
I don't think this solution works perfectly. Take a look at this array: int A[] = {2, 0, 6, 1, 1, 4, 2, 3, 5}; where n = 9. the result I get is {1, 0} -- although it did work for the array example the OP gave...
Sev
A: 

How about this:

for (i=0; i<n-1; i++) {
  for (j=i+1; j<n; j++) {
    if (a[i] == a[j]) {
        printf("%d appears more than once\n",a[i]);
        break;
    }
  }
}

Sure it's not the fastest, but it's simple and easy to understand, and requires no additional memory. If n is a small number like 9, or 100, then it may well be the "best". (i.e. "Best" could mean different things: fastest to execute, smallest memory footprint, most maintainable, least cost to develop etc..)

Dave
This solution is O(n**2). Standard `sort` routine usually is O(n*log(n)) and it costs you nothing to develop and maintain.
J.F. Sebastian
+3  A: 

Some answers to the question: Algorithm to determine if array contains n…n+m? contain as a subproblem solutions which you can adopt for your purpose.

For example, here's a relevant part from my answer:

bool has_duplicates(int* a, int m, int n)
{
  /** O(m) in time, O(1) in space (for 'typeof(m) == typeof(*a) == int')

      Whether a[] array has duplicates.

      precondition: all values are in [n, n+m) range.

      feature: It marks visited items using a sign bit.
  */
  assert((INT_MIN - (INT_MIN - 1)) == 1); // check n == INT_MIN
  for (int *p = a; p != &a[m]; ++p) {
    *p -= (n - 1); // [n, n+m) -> [1, m+1)
    assert(*p > 0);
  }

  // determine: are there duplicates
  bool has_dups = false;
  for (int i = 0; i < m; ++i) {
    const int j = abs(a[i]) - 1;
    assert(j >= 0);
    assert(j < m);
    if (a[j] > 0)
      a[j] *= -1; // mark
    else { // already seen
      has_dups = true;
      break;
    }
  }

  // restore the array
  for (int *p = a; p != &a[m]; ++p) {
    if (*p < 0) 
      *p *= -1; // unmark
    // [1, m+1) -> [n, n+m)
    *p += (n - 1);        
  }

  return has_dups; 
}

The program leaves the array unchanged (the array should be writeable but its values are restored on exit).

It works for array sizes upto INT_MAX (on 64-bit systems it is 9223372036854775807).

J.F. Sebastian
+5  A: 

You know that your Array contains every number from 0 to n-3 and the two repeating ones (p & q). For simplicity, lets ignore the 0-case for now.

You can calculate the sum and the product over the array, resulting in:

1 + 2 + ... + n-3 + p + q = p + q + (n-3)(n-2)/2

So if you substract (n-3)(n-2)/2 from the sum of the whole array, you get

sum(Array) - (n-3)(n-2)/2 = x = p + q

Now do the same for the product:

1 * 2 * ... * n - 3 * p * q = (n - 3)! * p * q

prod(Array) / (n - 3)! = y = p * q

Your now got these terms:

x = p + q

y = p * q

=> y(p + q) = x(p * q)

If you transform this term, you should be able to calculate p and q

sdfx
From Viète's theorem it follows that `p` and `q` are roots of the equation: z**2 - (p + q)*z + p*q = 0 therefore p,q = [k/2 + 1/2*(-4*m + k**2)**(1/2), k/2 - 1/2*(-4*m + k**2)**(1/2)], where k=x, m=y.
J.F. Sebastian
+1  A: 

Here's implementation in Python of @eugensk00's answer (one of its revisions) that doesn't use modular arithmetic. It is a single-pass algorithm, O(log(n)) in space. If fixed-width (e.g. 32-bit) integers are used then it is requires only two fixed-width numbers (e.g. for 32-bit: one 64-bit number and one 128-bit number). It can handle arbitrary large integer sequences (it reads one integer at a time therefore a whole sequence doesn't require to be in memory).

def two_repeated(iterable):
    s1, s2 = 0, 0
    for i, j in enumerate(iterable):
        s1 += j - i     # number_of_digits(s1) ~ 2 * number_of_digits(i)
        s2 += j*j - i*i # number_of_digits(s2) ~ 4 * number_of_digits(i) 
    s1 += (i - 1) + i
    s2 += (i - 1)**2 + i**2

    p = (s1 - int((2*s2 - s1**2)**.5)) // 2 
    # `Decimal().sqrt()` could replace `int()**.5` for really large integers
    # or any function to compute integer square root
    return p, s1 - p

Example:

>>> two_repeated([2, 3, 6, 1, 5, 4, 0, 3, 5])
(3, 5)

A more verbose version of the above code follows with explanation:

def two_repeated_seq(arr):
    """Return the only two duplicates from `arr`.

    >>> two_repeated_seq([2, 3, 6, 1, 5, 4, 0, 3, 5])
    (3, 5)
    """
    n = len(arr)
    assert all(0 <= i < n - 2 for i in arr) # all in range [0, n-2)
    assert len(set(arr)) == (n - 2) # number of unique items

    s1 = (n-2) + (n-1)       # s1 and s2 have ~ 2*(k+1) and 4*(k+1) digits  
    s2 = (n-2)**2 + (n-1)**2 # where k is a number of digits in `max(arr)`
    for i, j in enumerate(arr):
        s1 += j - i     
        s2 += j*j - i*i

    """
    s1 = (n-2) + (n-1) + sum(arr) - sum(range(n))
       = sum(arr) - sum(range(n-2))
       = sum(range(n-2)) + p + q - sum(range(n-2))
       = p + q
    """
    assert s1 == (sum(arr) - sum(range(n-2)))

    """
    s2 = (n-2)**2 + (n-1)**2 + sum(i*i for i in arr) - sum(i*i for i in range(n))
       = sum(i*i for i in arr) - sum(i*i for i in range(n-2))
       = p*p + q*q
    """
    assert s2 == (sum(i*i for i in arr) - sum(i*i for i in range(n-2)))

    """
    s1 = p+q
    -> s1**2 = (p+q)**2
    -> s1**2 = p*p + 2*p*q + q*q
    -> s1**2 - (p*p + q*q) = 2*p*q
    s2 = p*p + q*q
    -> p*q = (s1**2 - s2)/2

    Let C = p*q = (s1**2 - s2)/2 and B = p+q = s1 then from Viete theorem follows
    that p and q are roots of x**2 - B*x + C = 0
    -> p = (B + sqrtD) / 2
    -> q = (B - sqrtD) / 2
    where sqrtD = sqrt(B**2 - 4*C)

    -> p = (s1 + sqrt(2*s2 - s1**2))/2
    """
    sqrtD = (2*s2 - s1**2)**.5
    assert int(sqrtD)**2 == (2*s2 - s1**2) # perfect square
    sqrtD = int(sqrtD)
    assert (s1 - sqrtD) % 2 == 0 # even
    p = (s1 - sqrtD) // 2
    q = s1 - p
    assert q == ((s1 + sqrtD) // 2)
    assert sqrtD == (q - p)
    return p, q

NOTE: calculating integer square root of a number (~ N**4) makes the above algorithm non-linear.

J.F. Sebastian
+1  A: 
suppose array is

a[0], a[1], a[2] ..... a[n-1]

sumA = a[0] + a[1] +....+a[n-1]
sumASquare = a[0]*a[0] + a[1]*a[1] + a[2]*a[2] + .... + a[n]*a[n]

sumFirstN = (N*(N+1))/2 where N=n-3 so
sumFirstN = (n-3)(n-2)/2

similarly

sumFirstNSquare = N*(N+1)*(2*N+1)/6 = (n-3)(n-2)(2n-5)/6

Suppose repeated elements are = X and Y

so X + Y = sumA - sumFirstN;
X*X + Y*Y = sumASquare - sumFirstNSquare;

So on solving this quadratic we can get value of X and Y.
Time Complexity = O(n)
space complexity = O(1)
GG
A: 

for(i=1;i<=n;i++) { if(!(arr[i] ^ arr[i+1])) printf("Found Repeated number %5d",arr[i]); }

srinunaik
A: 

Why should we try out doing maths ( specially solving quadratic equations ) these are costly op . Best way to solve this would be t construct a bitmap of size (n-3) bits , i.e, (n -3 ) +7 / 8 bytes . Better to do a calloc for this memory , so every single bit will be initialized to 0 . Then traverse the list & set the particular bit to 1 when encountered , if the bit is set to 1 already for that no then that is the repeated no . This can be extended to find out if there is any missing no in the array or not. This solution is O(n) in time complexity

Saikat banerjee