views:

255

answers:

6

Hi, We've got some nonnegative numbers. We want to find the pair with maximum gcd. actually this maximum is more important than the pair! For example if we have:

2 4 5 15

gcd(2,4)=2

gcd(2,5)=1

gcd(2,15)=1

gcd(4,5)=1

gcd(4,15)=1

gcd(5,15)=5

the answer is 5.

+4  A: 

You can use the Euclidean Algorithm to find the GCD of two numbers.

while (b != 0) 
{
    int m = a % b;
    a = b;
    b = m;
}
return a;
Garee
I know this. but what about more than two? I want the maximum gcd between a pair of these numbers. for example if numbers are:2,4,5,10 the answer is 5, but of course it's not efficient to calculate gcd for each pair
AKGMA
For more than two you can use recursive formula gcd(a[1], ..., a[n]) = gcd(gcd(a[1], ..., a[n - 1]), a[n]).
kemiisto
Also there might exist some other algorithms. http://www.springerlink.com/content/gm5wjjcaaxx544w1/
kemiisto
This would be a O(n^2) solution. GCD every pair of numbers, keeping the highest found. I think the answer is to be more efficient.
Ishtar
@kemiisto - I think your algorithm would return the minimum, not maximum.
kunigami
+1  A: 

pseudocode

function getGcdMax(array[])

    arrayUB=upperbound(array)
    if (arrayUB<1)
        error
    pointerA=0
    pointerB=1

    gcdMax=0

    do
        gcdMax=MAX(gcdMax,gcd(array[pointera],array[pointerb]))
        pointerB++
        if (pointerB>arrayUB)
            pointerA++
            pointerB=pointerA+1
    until (pointerB>arrayUB)

    return gcdMax
El Ronnoco
Thanks for writing the pseudocode but i believe what you said is pretty obvious. I'm looking for a much faster solution. sth like O(nlgn) , the O(n^2) is not hard!
AKGMA
Yes I know it's bruteforce. You didn't specify any speed requirement in your post :)
El Ronnoco
Yeah! But i asked for an efficient way!
AKGMA
+3  A: 

The optimisations I can think of is

1) start with the two biggest numbers since they are likely to have most prime factors and thus likely to have the most shared prime factors (and thus the highest GCD).

2) When calculating the GCDs of other pairs you can stop your Euclidean algorithm loop if you get below your current greatest GCD.

Off the top of my head I can't think of a way that you can work out the greatest GCD of a pair without trying to work out each pair individually (and optimise a bit as above).

Disclaimer: I've never looked at this problem before and the above is off the top of my head. There may be better ways and I may be wrong. I'm happy to discuss my thoughts in more length if anybody wants. :)

Chris
Thanks! These were really really helpful.
AKGMA
+3  A: 

There is no O(n log n) solution to this problem in general. In fact, the worst case is O(n^2) in the number of items in the list. Consider the following set of numbers:

2^20 3^13 5^9 7^2*11^4 7^4*11^3

Only the GCD of the last two is greater than 1, but the only way to know that from looking at the GCDs is to try out every pair and notice that one of them is greater than 1.

So you're stuck with the boring brute-force try-every-pair approach, perhaps with a couple of clever optimizations to avoid doing needless work when you've already found a large GCD (while making sure that you don't miss anything).

Rex Kerr
Thanks. Very helpful
AKGMA
"...but the only way to know that from looking at the GCDs is...", but we do not have to look at them, so we can beat `O(n log n)`. Computing the biggest GCD does not imply looking at every GCD. Just like there are faster sorting algorithms than `O(n log n)`
Ishtar
@Ishtar - Unfortunately, no obvious assumption will let you do better than looking at every GCD. There are some not-entirely-unreasonable assumptions that may let you do better (e.g. that the maximum size of a number in the list, `M`, is no larger than, say, `N^(3/2)` on average), but those assumptions weren't stated. For a general algorithm, where lists are non-gigantic and the size of the numbers is in general way bigger than the list length, my example holds.
Rex Kerr
If arithmetic is done in O(1), you can check if there are two numbers a,b with `gcd(a,b)!=1` in O(n) gcds. Compute product of all numbers `P`, and check if there is an i such that `gcd(P/a_i,a_i)` != 1. If so, do a linear search on `j` for `gcd(a_i,a_j) != 1`. I don't see how to exploit this trick to find maximum GCD in faster than Theta(n^2) time, however.
sdcvvc
+2  A: 

If you want an alternative to the obvious algorithm, then assuming your numbers are in a bounded range, and you have plenty of memory, you can beat O(N^2) time, N being the number of values:

  • Create an array of a small integer type, indexes 1 to the max input. O(1)
  • For each value, increment the count of every element of the index which is a factor of the number (make sure you don't wraparound). O(N).
  • Starting at the end of the array, scan back until you find a value >= 2. O(1)

That tells you the max gcd, but doesn't tell you which pair produced it. For your example input, the computed array looks like this:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
4 2 1 1 2 0 0 0 0 0  0  0  0  0  1

I don't know whether this is actually any faster for the inputs you have to handle. The constant factors involved are large: the bound on your values and the time to factorise a value within that bound.

You don't have to factorise each value - you could use memoisation and/or a pregenerated list of primes. Which gives me the idea that if you are memoising the factorisation, you don't need the array:

  • Create an empty set of int, and a best-so-far value 1.
  • For each input integer:
    • if it's less than or equal to best-so-far, continue.
    • check whether it's in the set. If so, best-so-far = max(best-so-far, this-value), continue. If not:
      • add it to the set
      • repeat for all of its factors (larger than best-so-far).

Add/lookup in a set could be O(log N), although it depends what data structure you use. Each value has O(f(k)) factors, where k is the max value and I can't remember what the function f is...

The reason that you're finished with a value as soon as you encounter it in the set is that you've found a number which is a common factor of two input values. If you keep factorising, you'll only find smaller such numbers, which are not interesting.

I'm not quite sure what the best way is to repeat for the larger factors. I think in practice you might have to strike a balance: you don't want to do them quite in decreasing order because it's awkward to generate ordered factors, but you also don't want to actually find all the factors.

Even in the realms of O(N^2), you might be able to beat the use of the Euclidean algorithm:

Fully factorise each number, storing it as a sequence of exponents of primes (so for example 2 is {1}, 4 is {2}, 5 is {0, 0, 1}, 15 is {0, 1, 1}). Then you can calculate gcd(a,b) by taking the min value at each index and multiplying them back out. No idea whether this is faster than Euclid on average, but it might be. Obviously it uses a load more memory.

Steve Jessop
This is a nice solution!
AKGMA
The bound should be a variable `M`. The second step has complexity `O(N*M)`. Thus, this is only a good idea if your list is long relative to the number of integers you have in your range. But this means that there must be repeats of the same number and you're probably better off radix sorting and looking for repeats (`O(N+M)`)!
Rex Kerr
Actually, since GCD generally takes `log(M)` time, I guess the second step has complexity `O(N*M)` instead of `O(N*N*log(M))`, so it is a good strategy when `M/log(M) < N`.
Rex Kerr
The number of divisors of a number `k` is bounded by `2*sqrt(k)` (not sure if it is tight).
kunigami
A further optimization would be sorting the input so you can stop the algorithm whenever you reach a number which is smaller than best-so-far.
kunigami
@Rex Kerr: Having stated my assumptions, I can do what I like ;-) You're quite right if "number" is taken in its proper, full generality, but experience shows that in many problems on SO, "number" does have some restrictions, so I'm throwing some ideas to see what sticks. If this was an exam question on algorithm theory, of course I wouldn't try any of this on.
Steve Jessop
A: 

There is a solution that would take O(n):

Let our numbers be a_i. First, calculate m=a_0*a_1*a_2*.... For each number a_i, calculate gcd(m/a_i, a_i). The number you are looking for is the maximum of these values.

I haven't proved that this is always true, but in your example, it works:

m=2*4*5*15=600,

max(gcd(m/2,2), gcd(m/4,4), gcd(m/5,5), gcd(m/15,15))=max(2, 2, 5, 5)=5


NOTE: This is not correct. If the number a_i has a factor p_j repeated twice, and if two other numbers also contain this factor, p_j, then you get the incorrect result p_j^2 insted of p_j. For example, for the set 3, 5, 15, 25, you get 25 as the answer instead of 5.

However, you can still use this to quickly filter out numbers. For example, in the above case, once you determine the 25, you can first do the exhaustive search for a_3=25 with gcd(a_3, a_i) to find the real maximum, 5, then filter out gcd(m/a_i, a_i), i!=3 which are less than or equal to 5 (in the example above, this filters out all others).


Added for clarification and justification:

To see why this should work, note that gcd(a_i, a_j) divides gcd(m/a_i, a_i) for all j!=i.

Let's call gcd(m/a_i, a_i) as g_i, and max(gcd(a_i, a_j),j=1..n, j!=i) as r_i. What I say above is g_i=x_i*r_i, and x_i is an integer. It is obvious that r_i <= g_i, so in n gcd operations, we get an upper bound for r_i for all i.

The above claim is not very obvious. Let's examine it a bit deeper to see why it is true: the gcd of a_i and a_j is the product of all prime factors that appear in both a_i and a_j (by definition). Now, multiply a_j with another number, b. The gcd of a_i and b*a_j is either equal to gcd(a_i, a_j), or is a multiple of it, because b*a_j contains all prime factors of a_j, and some more prime factors contributed by b, which may also be included in the factorization of a_i. In fact, gcd(a_i, b*a_j)=gcd(a_i/gcd(a_i, a_j), b)*gcd(a_i, a_j), I think. But I can't see a way to make use of this. :)

Anyhow, in our construction, m/a_i is simply a shortcut to calculate the product of all a_j, where j=1..1, j!=i. As a result, gcd(m/a_i, a_i) contains all gcd(a_i, a_j) as a factor. So, obviously, the maximum of these individual gcd results will divide g_i.

Now, the largest g_i is of particular interest to us: it is either the maximum gcd itself (if x_i is 1), or a good candidate for being one. To do that, we do another n-1 gcd operations, and calculate r_i explicitly. Then, we drop all g_j less than or equal to r_i as candidates. If we don't have any other candidate left, we are done. If not, we pick up the next largest g_k, and calculate r_k. If r_k <= r_i, we drop g_k, and repeat with another g_k'. If r_k > r_i, we filter out remaining g_j <= r_k, and repeat.

I think it is possible to construct a number set that will make this algorithm run in O(n^2) (if we fail to filter out anything), but on random number sets, I think it will quickly get rid of large chunks of candidates.

Dysaster
Do you have a proof for this?
AKGMA
Not a proof, no. Only a few simple facts that you can make an algorithm of the NOTE section above. I will edit the answer and add more details on why I think this should work.
Dysaster
I've written a sample test program to see the performance. The above algorithm does generate correct values, and with 1000 random numbers in the range 2 to 10000, it performs between 1999 and 13000 gcd operations (instead of 999000 of the brute force method). If I used 10000 numbers, it was between 19999 and 315000 gcd operations. I didn't check against brute force results for the 10000 set though. It was taking too long. :)
Dysaster
@Dysaster: your original suggestion wouldn't be O(n). The product of the n inputs has Ω(n) digits (assuming a non-trivial distribution of values), so each of the n divisions that you do is itself Ω(n), for total Ω(n^2). Interesting heuristic, though.
Steve Jessop
@Steve: I am not certain I follow you. Are we talking about the performance of large number arithmetic here? If that is the case, and if the numbers are indeed one word each (say, 32-bits), then you are correct. The multiplication will generate a number that is roughly n words. It is a good point that I haven't considered. A long division of n words by a single word and a gcd between this number and a single word number is not as slow as n gcd operations, but it is still not O(n).
Dysaster
@Dysaster: yes, you did follow me: as long as the inputs aren't mostly "1", their product will have roughly as many digits as all the input put together. Agreed that an n-sized division is faster than n gcd ops.
Steve Jessop