ansaurus

Question

Algorithm to find two repeated numbers in an array, without sorting

Answer 1

+1 A:

Sorting the array would seem to be the best solution. A simple sort would then make the search trivial and would take a whole lot less time/space.

Otherwise, if you know the domain of the numbers, create an array with that many buckets in it and increment each as you go through the array. something like this:

int count [10];

for (int i = 0; i < arraylen; i++) {
    count[array[i]]++;
}

Then just search your array for any numbers greater than 1. Those are the items with duplicates. Only requires one pass across the original array and one pass across the count array.

Steve Rowe 2009-02-17 07:00:35

His or her teacher put restrictions on sorting.

KingNestor 2009-02-17 07:04:57

I agree it's probably just a homework question ... however, this can be a real problem if the elements are not comparable.

tjdonaldson 2009-02-17 07:06:51

How would you have incomparable elements? You just need to force some ordering on them. Use whatever the criteria is for sameness to derive that ordering system.

Steve Rowe 2009-02-17 07:08:05

given the requirement that there are only ever two repeated numbers it would be better to just return the value once a duplicate is found.

mezoid 2009-02-17 07:10:19

Of course you can force an ordering on anything ... but how would you do that for elements that are not already pre-defined as comparable? That should be part of the solution. Also, it can be tough in practice to have small enough ranges to make counting occurrences feasible.

tjdonaldson 2009-02-17 07:16:51

@tj the problem specifies they are numbers and that they are in the range of 0 to n-3. That would be comparable and small enough.

Steve Rowe 2009-02-17 07:20:58

Answer 2

A:

For each number: check if it exists in the rest of the array.

mookid8000 2009-02-17 07:01:35

The running time on that would be horrible though. O(N^2). Probably needs a better solution.

Steve Rowe 2009-02-17 07:05:41

Indeed. This is less efficient than sorting.

jrockway 2009-02-17 07:11:36

But as sorting is not allowed, it's a reasonable solution. It's certainly more practical than using an array to hold a count for all elements in the domain.

tjdonaldson 2009-02-17 07:18:26

And 'best way' may be different than what one might think :)

Aleris 2009-02-17 07:31:56

This is what I get from answering a vague question, I guess :-) it's the best way if you consider the (humanly percieved, not algorithmic) complexity of the algorithm...

mookid8000 2009-02-17 07:34:38

@tjdonaldson: is it the 'best' way when N = 100,000,000? Doesn't sound very reasonable to me.

MahlerFive 2009-02-17 07:45:38

This solution is O(n^2). There is a solution with sorting that would be O(n log(n)). The best solution is O(n), and no, it doesn't need an array for counting.

Svante 2009-02-17 14:05:13

It's funny that so many people consider performance the only quality metric - what about one's ability to quickly understand the algorithm and not wasting your brain cycles grokking some local optimization to a simple problem?

mookid8000 2009-02-17 18:07:03

The optimal solution is "grad student - write a program to find repeated numbers in this array, have it ready for tomorrow"

Martin Beckett 2009-02-18 04:34:12

This would also require an extra array, or else you would check the repeating numbers several times.

sindre j 2009-10-22 16:29:18

Answer 3

+6 A:

Insert each element into a set/hashtable, first checking if its are already in it.

tjdonaldson 2009-02-17 07:05:01

If the total number of distinct values are small (less than 100) it really doesn't matter if it's a set or not. Searching through a list linearly will be many times faster.

John Leidegren 2009-02-17 07:29:37

@John If the number is small it doesn't matter, because both approaches are fast. For large n a hashtable or tree implementation of a set is much better. Plus it is not good to choose a list for what conceptually is a set.

starblue 2009-02-17 08:17:11

@starblue I would argue against that. My opinion is that linearity is always favorable over trees, hash tables and what not.

John Leidegren 2009-02-17 14:03:29

Hashing as pointless, as you already know the exact domain, which is conveniently integers starting at 0.

recursive 2009-02-18 18:04:32

I think hashtable are better than arrays, as you need not specify the size while creating them, which is not the case with arrays

Learner 2009-07-07 08:06:30

Answer 4

A:

Without sorting you're going to have a keep track of numbers you've already visited.

in psuedocode this would basically be (done this way so I'm not just giving you the answer):

for each number in the list
   if number not already in unique numbers list
      add it to the unique numbers list
   else
      return that number as it is a duplicate
   end if
end for each

mezoid 2009-02-17 07:06:08

Isn't the running time of this still O(N^2)? The length of the unique numbers list will grow to nearly the length of the array and have to be searched each number. I suppose it could be a sorted list so O(N * LogN) might be possible.

Steve Rowe 2009-02-17 07:17:37

If the list is replaced by a hash table or set, it's faster than O(n^2).

tjdonaldson 2009-02-17 07:20:40

well, the best approach would be to sort the list first...but since that isn't an option we just want to find the first duplicate and return it. In the best case we will return very quickly...worse case we'll return after examining the entire list. I'm not sure how that can be avoided given no sort

mezoid 2009-02-17 07:21:07

If you use proper storage for "unique number list", like a tree, then this too is O(N log N). If you spend more memory and use e.g. a bool[] indicating seen status, it even becomes O(N) in time but O(MAX_NUMBER) in space.

David Schmitt 2009-02-17 07:22:38

@tjdonaldson Yes, if I were writing it in C# I'd definitely put it in a hash table. But since the language hasn't been specified I wrote it in psuedocode...that way if he's writing in Basic or Assembler etc he'll be able to figure something out...

mezoid 2009-02-17 07:22:59

i don't understand, why this was voted down. Very normal, common, practical approach. Just do not treat "unique numbers list" too literally.

eugensk00 2009-02-18 04:03:21

Answer 5

+14 A:

There is a O(n) solution if you know what the possible domain of input is. For example if your input array contains numbers between 0 to 100, consider the following code.

bool flags[100];
for(int i = 0; i < 100; i++)
    flags[i] = false;

for(int i = 0; i < input_size; i++)
    if(flags[input_array[i]])
         return input_array[i];
    else       
        flags[input_array[i]] = true;

Of course there is the additional memory but this is the fastest.

Sesh 2009-02-17 07:08:15

If the elements are integers or strings, which would be pretty common, then this approach is not going to work.

tjdonaldson 2009-02-17 07:11:28

The question already specified the domain of the input, so this is perfectly acceptable.

Eclipse 2009-02-17 07:13:21

A hash table can replace the array and it would work for any input then. As I mentioned, the downside is the additional memory requirement but speed wise it works.

Sesh 2009-02-17 07:16:40

@sesh This is the best technique but it can be done even better - if the numbers range from 0-n then all you need are n BITS, not bytes or bools. It sounds pedantic, but often these type of questions will be phrased where n is in the billions.

Andrew Grant 2009-02-17 08:17:28

Andrew - you are right. Only I was too lazy to type the bit operations - its crazy typing code in a text editor :)

Sesh 2009-02-17 09:54:20

It is possible (with some constraint) to write this algorithm in O(1) memory. See http://stackoverflow.com/questions/555744/algorithm-to-find-two-repeated-numbers-in-an-array-without-sorting/556686#556686

J.F. Sebastian 2009-02-17 14:01:45

Answer 6

+5 A:

Check this old but good paper on the topic:

Finding Repeated Elements (PDF)

CMS 2009-02-17 07:08:58

This solution is O(n*log(n)) in this case (due to there are only 2 duplicates in the array). Therefore it is not better than sorting in this case. The best solution should take into account that the number of duplicates is 2 and all values are in [0, n-3] range.

J.F. Sebastian 2009-02-17 15:12:18

Answer 7

+6 A:

You might be able to take advantage of the fact that sum(array) = (n-2)*(n-3)/2 + two missing numbers.

Edit: As others have noted, combined with the sum-of-squares, you can use this, I was just a little slow in figuring it out.

Eclipse 2009-02-17 07:09:41

not if they are not consecutive numbers

Sesh 2009-02-17 07:12:39

The question specified that the numbers are between 0 and n-3, plus the two repeated numbers, so every number between 0 and n-3 must be in the array.

Eclipse 2009-02-17 07:14:47

even if i know the sum, then what??

Aman Jain 2009-02-17 07:18:00

This doesn't work. -1.

Andrew Grant 2009-02-17 08:19:08

The sum of the array elements includes one occurrence, so the formula would be:x = (n-2) * (n-3) / 2 - Sum of array elements.

dirkgently 2009-02-17 08:24:21

If you combine the sum with the product, the result will be unique: http://stackoverflow.com/questions/555744/algorithm-to-find-two-repeated-numbers-in-an-array-without-sorting/556744#556744

sdfx 2009-02-17 17:09:00

Yup - it was sitting there in the back of my mind, but I wasn't getting that last bit of information (I was stuck trying to figure out a solution using XOR - which I still think might be potentially faster than the sum-of-squares)

Eclipse 2009-02-17 17:58:35

Answer 8

+14 A:

OK, seems I just can't give it a rest :)

Simplest solution

int A[N] = {...};

int signed_1(n) { return n%2<1 ? +n : -n;  } // 0,-1,+2,-3,+4,-5,+6,-7,...
int signed_2(n) { return n%4<2 ? +n : -n;  } // 0,+1,-2,-3,+4,+5,-6,-7,...

long S1 = 0;  // or int64, or long long, or some user-defined class
long S2 = 0;  // so that it has enough bits to contain sum without overflow

for (int i=0; i<N-2; ++i)
{
   S1 += signed_1(A[i]) - signed_1(i);
   S2 += signed_2(A[i]) - signed_2(i);
} 

for (int i=N-2; i<N; ++i)
{
   S1 += signed_1(A[i]);
   S2 += signed_2(A[i]);
} 

S1 = abs(S1)
S2 = abs(S2)

p = (S1+S2)/2;
q = abs(S1-S2)/2;

One sum (S1 or S2) contains p and q with the same sign, the other sum - with opposite signs, all other members are eliminated.
S1 and S2 must have enough bits to accommodate sums, the algorithm does not stand for overflow because of abs().

Previous solution

I doubt somebody will ever encounter such a problem in the field ;)
and I guess, I know the teacher's expectation:

Lets take array {0,1,2,...,n-2,n-1},
The given one can be produced by replacing last two elements n-2 and n-1 with unknown p and q (less order)

so, the sum of elements will be (n-1)n/2 + p + q - (n-2) - (n-1)
the sum of squares (n-1)n(2n-1)/6 + p^2 + q^2 - (n-2)^2 - (n-1)^2

Simple math remains:

  (1)  p+q = S1  
  (2)  p^2+q^2 = S2

Surely you won't solve it as math classes teach to solve square equations.

First, calculate everything modulo 2^32, that is, allow for overflow.
Then check pairs {p,q}: {0, S1}, {1, S1-1} ... against expression (2) to find candidates (there might be more than 2 due to modulo and squaring)
And finally check found candidates if they really are present in array twice.

eugensk00 2009-02-17 08:39:15

O(N) and only requires two integer variables to store the sums. Lovely.

Sol 2009-02-17 09:54:08

Lovely, but maybe confusing. p+q != p^2+p^2.

Mart Oruaas 2009-02-17 10:39:55

I don't think this solution works perfectly. Take a look at this array: int A[] = {2, 0, 6, 1, 1, 4, 2, 3, 5}; where n = 9. the result I get is {1, 0} -- although it did work for the array example the OP gave...

Sev 2009-06-26 23:07:41

Answer 9

A:

How about this:

for (i=0; i<n-1; i++) {
  for (j=i+1; j<n; j++) {
    if (a[i] == a[j]) {
        printf("%d appears more than once\n",a[i]);
        break;
    }
  }
}

Sure it's not the fastest, but it's simple and easy to understand, and requires no additional memory. If n is a small number like 9, or 100, then it may well be the "best". (i.e. "Best" could mean different things: fastest to execute, smallest memory footprint, most maintainable, least cost to develop etc..)

Dave 2009-02-17 09:06:33

This solution is O(n**2). Standard `sort` routine usually is O(n*log(n)) and it costs you nothing to develop and maintain.

J.F. Sebastian 2009-02-17 14:14:11

Answer 10

+3 A:

Some answers to the question: Algorithm to determine if array contains n…n+m? contain as a subproblem solutions which you can adopt for your purpose.

For example, here's a relevant part from my answer:

bool has_duplicates(int* a, int m, int n)
{
  /** O(m) in time, O(1) in space (for 'typeof(m) == typeof(*a) == int')

      Whether a[] array has duplicates.

      precondition: all values are in [n, n+m) range.

      feature: It marks visited items using a sign bit.
  */
  assert((INT_MIN - (INT_MIN - 1)) == 1); // check n == INT_MIN
  for (int *p = a; p != &a[m]; ++p) {
    *p -= (n - 1); // [n, n+m) -> [1, m+1)
    assert(*p > 0);
  }

  // determine: are there duplicates
  bool has_dups = false;
  for (int i = 0; i < m; ++i) {
    const int j = abs(a[i]) - 1;
    assert(j >= 0);
    assert(j < m);
    if (a[j] > 0)
      a[j] *= -1; // mark
    else { // already seen
      has_dups = true;
      break;
    }
  }

  // restore the array
  for (int *p = a; p != &a[m]; ++p) {
    if (*p < 0) 
      *p *= -1; // unmark
    // [1, m+1) -> [n, n+m)
    *p += (n - 1);        
  }

  return has_dups; 
}

The program leaves the array unchanged (the array should be writeable but its values are restored on exit).

It works for array sizes upto INT_MAX (on 64-bit systems it is 9223372036854775807).

J.F. Sebastian 2009-02-17 13:03:42

Answer 11

+5 A:

You know that your Array contains every number from 0 to n-3 and the two repeating ones (p & q). For simplicity, lets ignore the 0-case for now.

You can calculate the sum and the product over the array, resulting in:

1 + 2 + ... + n-3 + p + q = p + q + (n-3)(n-2)/2

So if you substract (n-3)(n-2)/2 from the sum of the whole array, you get

sum(Array) - (n-3)(n-2)/2 = x = p + q

Now do the same for the product:

1 * 2 * ... * n - 3 * p * q = (n - 3)! * p * q

prod(Array) / (n - 3)! = y = p * q

Your now got these terms:

x = p + q

y = p * q

=> y(p + q) = x(p * q)

If you transform this term, you should be able to calculate p and q

sdfx 2009-02-17 13:19:33

From Viète's theorem it follows that `p` and `q` are roots of the equation: z**2 - (p + q)*z + p*q = 0 therefore p,q = [k/2 + 1/2*(-4*m + k**2)**(1/2), k/2 - 1/2*(-4*m + k**2)**(1/2)], where k=x, m=y.

J.F. Sebastian 2009-02-17 13:58:23

Answer 12

+1 A:

Here's implementation in Python of @eugensk00's answer (one of its revisions) that doesn't use modular arithmetic. It is a single-pass algorithm, O(log(n)) in space. If fixed-width (e.g. 32-bit) integers are used then it is requires only two fixed-width numbers (e.g. for 32-bit: one 64-bit number and one 128-bit number). It can handle arbitrary large integer sequences (it reads one integer at a time therefore a whole sequence doesn't require to be in memory).

def two_repeated(iterable):
    s1, s2 = 0, 0
    for i, j in enumerate(iterable):
        s1 += j - i     # number_of_digits(s1) ~ 2 * number_of_digits(i)
        s2 += j*j - i*i # number_of_digits(s2) ~ 4 * number_of_digits(i) 
    s1 += (i - 1) + i
    s2 += (i - 1)**2 + i**2

    p = (s1 - int((2*s2 - s1**2)**.5)) // 2 
    # `Decimal().sqrt()` could replace `int()**.5` for really large integers
    # or any function to compute integer square root
    return p, s1 - p

Example:

>>> two_repeated([2, 3, 6, 1, 5, 4, 0, 3, 5])
(3, 5)

A more verbose version of the above code follows with explanation:

def two_repeated_seq(arr):
    """Return the only two duplicates from `arr`.

    >>> two_repeated_seq([2, 3, 6, 1, 5, 4, 0, 3, 5])
    (3, 5)
    """
    n = len(arr)
    assert all(0 <= i < n - 2 for i in arr) # all in range [0, n-2)
    assert len(set(arr)) == (n - 2) # number of unique items

    s1 = (n-2) + (n-1)       # s1 and s2 have ~ 2*(k+1) and 4*(k+1) digits  
    s2 = (n-2)**2 + (n-1)**2 # where k is a number of digits in `max(arr)`
    for i, j in enumerate(arr):
        s1 += j - i     
        s2 += j*j - i*i

    """
    s1 = (n-2) + (n-1) + sum(arr) - sum(range(n))
       = sum(arr) - sum(range(n-2))
       = sum(range(n-2)) + p + q - sum(range(n-2))
       = p + q
    """
    assert s1 == (sum(arr) - sum(range(n-2)))

    """
    s2 = (n-2)**2 + (n-1)**2 + sum(i*i for i in arr) - sum(i*i for i in range(n))
       = sum(i*i for i in arr) - sum(i*i for i in range(n-2))
       = p*p + q*q
    """
    assert s2 == (sum(i*i for i in arr) - sum(i*i for i in range(n-2)))

    """
    s1 = p+q
    -> s1**2 = (p+q)**2
    -> s1**2 = p*p + 2*p*q + q*q
    -> s1**2 - (p*p + q*q) = 2*p*q
    s2 = p*p + q*q
    -> p*q = (s1**2 - s2)/2

    Let C = p*q = (s1**2 - s2)/2 and B = p+q = s1 then from Viete theorem follows
    that p and q are roots of x**2 - B*x + C = 0
    -> p = (B + sqrtD) / 2
    -> q = (B - sqrtD) / 2
    where sqrtD = sqrt(B**2 - 4*C)

    -> p = (s1 + sqrt(2*s2 - s1**2))/2
    """
    sqrtD = (2*s2 - s1**2)**.5
    assert int(sqrtD)**2 == (2*s2 - s1**2) # perfect square
    sqrtD = int(sqrtD)
    assert (s1 - sqrtD) % 2 == 0 # even
    p = (s1 - sqrtD) // 2
    q = s1 - p
    assert q == ((s1 + sqrtD) // 2)
    assert sqrtD == (q - p)
    return p, q

NOTE: calculating integer square root of a number (~ N**4) makes the above algorithm non-linear.

J.F. Sebastian 2009-02-18 17:59:16

Answer 13

+1 A:

suppose array is

a[0], a[1], a[2] ..... a[n-1]

sumA = a[0] + a[1] +....+a[n-1]
sumASquare = a[0]*a[0] + a[1]*a[1] + a[2]*a[2] + .... + a[n]*a[n]

sumFirstN = (N*(N+1))/2 where N=n-3 so
sumFirstN = (n-3)(n-2)/2

similarly

sumFirstNSquare = N*(N+1)*(2*N+1)/6 = (n-3)(n-2)(2n-5)/6

Suppose repeated elements are = X and Y

so X + Y = sumA - sumFirstN;
X*X + Y*Y = sumASquare - sumFirstNSquare;

So on solving this quadratic we can get value of X and Y.
Time Complexity = O(n)
space complexity = O(1)

GG 2009-09-15 14:04:29

Answer 14

A:

for(i=1;i<=n;i++) { if(!(arr[i] ^ arr[i+1])) printf("Found Repeated number %5d",arr[i]); }

srinunaik 2009-11-04 09:40:51

Answer 15

A:

Why should we try out doing maths ( specially solving quadratic equations ) these are costly op . Best way to solve this would be t construct a bitmap of size (n-3) bits , i.e, (n -3 ) +7 / 8 bytes . Better to do a calloc for this memory , so every single bit will be initialized to 0 . Then traverse the list & set the particular bit to 1 when encountered , if the bit is set to 1 already for that no then that is the repeated no . This can be extended to find out if there is any missing no in the array or not. This solution is O(n) in time complexity

Saikat banerjee 2010-06-14 15:27:19

ansaurus

tags:

views:

answers:

Algorithm to find two repeated numbers in an array, without sorting

Simplest solution

Previous solution

related questions