ansaurus

Question

Calculating Binomial Coefficient (nCk) for large n & k

Answer 1

A:

psudo code for calculating nCk:

result = 1    
for i=1 to min{K,N-K}:
   result *= N-i+1
   result /= i
return result

Time Complexity: O(min{K,N-K})

The loop goes from i=1 to min{K,N-K} instead of from i=1 to K, and that's ok because

C(k,n) = C(k, n-k)

And you can calculate the thing even more efficiently if you use the GammaLn function.

nCk = exp(GammaLn(n+1)-GammaLn(k+1)-GammaLn(n-k+1))

The GammaLn function is the natural logarithm of the Gamma function. I know there's an efficient algorithm to calculate the GammaLn function but that algorithm isn't trivial at all.

snakile 2010-08-21 11:09:05

Given the possible size of both N and K, I'd say your products are going to overflow pretty quick.

walkytalky 2010-08-21 11:19:48

@walkytalky You were right. I changed the psudo-code. Now we don't calculate the nuerator and denominator seperately. That's better, right?

snakile 2010-08-21 11:33:48

@snakile: not better since the sizes for the problem were chosen to make the naive approach too slow, and the mod 1009 part you are leaving out, and if you were to include it you would crash trying to calculate 0/0.

GregS 2010-08-21 11:49:33

The order is too high. The time limit is 3 sec.Moreover your answer won't be exact.

AKGMA 2010-08-21 11:53:24

@snakile it's better. But in the context of the question I'd also note that you're only interested the result modulo 1009. I suspect we could make use of this to limit ourselves to O(10^3) operations rather than O(10^15). Instinctively it seems there'll probably be some rule like: if (N-K) > 2018, the result will always be 0. But those numbers are plucked from nowhere -- I'd have to work through it to check.

walkytalky 2010-08-21 11:54:49

Checking whether the result is 0(mod 1009) is easy. you can just count how many 1009s are there in n! and how many in k!(n-k)!and one point i thought a lot about was that 1009 is a prime. Doesn't this help?

AKGMA 2010-08-21 12:00:24

@AKGMA: I think your reasoning is a good start towards coming up with an algorithm.

GregS 2010-08-21 12:20:50

Thanks for that!

AKGMA 2010-08-21 13:24:27

oops, I somehow forgot about the mod 1009. The question is much more interesting than I thought. Turns out my answer is useless. I think the moron's answer (with Lucas' Theorem) solves this but I'm not sure.

snakile 2010-08-21 16:03:55

Answer 2

+4 A:

Binomial coefficient is one factorial divided by two others, although the k! term on the bottom cancels in an obvious way.

Observe that if 1009, (including multiples of it), appears more times in the numerator than the denominator, then the answer mod 1009 is 0. It can't appear more times in the denominator than the numerator (since binomial coefficients are integers), hence the only cases where you have to do anything are when it appears the same number of times in both. Don't forget to count multiples of (1009)^2 as two, and so on.

After that, I think you're just mopping up small cases (meaning small numbers of values to multiply/divide), although I'm not sure without a few tests. On the plus side 1009 is prime, so arithmetic modulo 1009 takes place in a field, which means that after casting out multiples of 1009 from both top and bottom, you can do the rest of the multiplication and division mod 1009 in any order.

Where there are non-small cases left, they will still involve multiplying together long runs of consecutive integers. This can be simplified by knowing 1008! (mod 1009). It's -1 (1008 if you prefer), since 1 ... 1008 are the p-1 non-zero elements of the prime field over p. Therefore they consist of 1, -1, and then (p-3)/2 pairs of multiplicative inverses.

So for example consider the case of C((1009^3), 200).

Imagine that the number of 1009s are equal (don't know if they are, because I haven't coded a formula to find out), so that this is a case requiring work.

On the top we have 201 ... 1008, which we'll have to calculate or look up in a precomputed table, then 1009, then 1010 ... 2017, 2018, 2019 ... 3026, 3027, etc. The ... ranges are all -1, so we just need to know how many such ranges there are.

That leaves 1009, 2018, 3027, which once we've cancelled them with 1009's from the bottom will just be 1, 2, 3, ... 1008, 1010, ..., plus some multiples of 1009^2, which again we'll cancel and leave ourselves with consecutive integers to multiply.

We can do something very similar with the bottom to compute the product mod 1009 of "1 ... 1009^3 - 200 with all the powers of 1009 divided out". That leaves us with a division in a prime field. IIRC that's tricky in principle, but 1009 is a small enough number that we can manage 1000 of them (the upper limit on the number of test cases).

Of course with k=200, there's an enormous overlap which could be cancelled more directly. That's what I meant by small cases and non-small cases: I've treated it like a non-small case, when in fact we could get away with just "brute-forcing" this one, by calculating ((1009^3-199) * ... * 1009^3) / 200!

Steve Jessop 2010-08-21 13:00:04

What about this case?C(1000000000000000,500000000000000)It's not completely clear what you mean. can you please write a pseudo-code or sth.?

AKGMA 2010-08-21 13:23:32

Not my homework ;-) But I've tried to explain what I mean with an example.

Steve Jessop 2010-08-21 13:36:16

Thanks! I got it.It's a nice and creative algorithm

AKGMA 2010-08-21 19:38:52

Thanks. I hope I haven't completely wasted my time, and that the proof of Lucas' theorem starts out along these lines, counts up all the things I mention, sees (after some more cunning tricks) that nearly everything cancels out or collects together, and comes up with its answer. I haven't looked up any proofs, though, in case I'm wrong and there's one really simple trick that I've just totally missed ;-)

Steve Jessop 2010-08-21 19:47:58

Answer 3

+4 A:

Notice that 1009 is a prime.

Now you can use Lucas' Theorem.

Which states:

Let p be a prime.
If n  = a1a2...ar when written in base p and
if k  = b1b2...br when written in base p

(pad with zeroes if required)

Then

n choose k = (a1 choose b1) * (a2 choose  b2) * ... * (ar choose br) modulo p.

i.e. remainder of n choose k when divided by p is same as the remainder of
the product (a1 choose b1) * .... * (ar choose br) when divided by p.
Note: if bi > ai then ai choose bi is 0.

Thus your problem is reduced to finding the product modulo 1009 of at most log N/log 1009 numbers (number of digits of N in base 1009) of the form a choose b where a <= 1009 and b <= 1009.

This should make it easier even when N is close to 10^15.

Here is some code for Binomial I had written long back, all you need to do is to modify it to do the operations modulo 1009 (there might be bugs and not necessarily recommended coding style):

class Binomial
{
public:
    Binomial(int Max)
    {
        max = Max+1;
        table = new unsigned int * [max]();
        for (int i=0; i < max; i++)
        {
            table[i] = new unsigned int[max]();

            for (int j = 0; j < max; j++)
            {
                table[i][j] = 0;
            }
        }
    }

    ~Binomial()
    {
        for (int i =0; i < max; i++)
        {
            delete table[i];
        }
        delete table;
    }

    unsigned int Choose(unsigned int n, unsigned int k);

private:
    bool Contains(unsigned int n, unsigned int k);

    int max;
    unsigned int **table;
};

unsigned int Binomial::Choose(unsigned int n, unsigned int k)
{
    if (n < k) return 0;
    if (k == 0 || n==1 ) return 1;
    if (n==2 && k==1) return 2;
    if (n==2 && k==2) return 1;
    if (n==k) return 1;


    if (Contains(n,k))
    {
        return table[n][k];
    }
    table[n][k] = Choose(n-1,k) + Choose(n-1,k-1);
    return table[n][k];
}

bool Binomial::Contains(unsigned int n, unsigned int k)
{
    if (table[n][k] == 0) 
    {
        return false;
    }
    return true;
}

Moron 2010-08-21 13:29:46

Same comment as I made to dmuir, although I think the Wikipedia article answers it: C(2,7) is 0.

Steve Jessop 2010-08-21 14:04:17

@Steve: Yes, if a digit for k is bigger than the corresponding digit for n, the answer is 0. In fact that is one of the criteria for tests of divisibility of binomial coefficients by a prime.

Moron 2010-08-21 14:07:25

and yes, that is the definition of n choose k when k > n.

Moron 2010-08-21 14:22:35

This is a better answer

AKGMA 2010-08-21 19:40:12

Answer 4

+4 A:

I don't think you want to calculate C(n,k) and then reduce mod 1009. The biggest one, C(1e15,5e14) will require something like 1e16 bits ~ 1000 terabytes

Moreover executing the loop in snakiles answer 1e15 times seems like it might take a while. What you might use is, if

n = n0 + n1*p + n2*p^2 ... + nd*p^d

m = m0 + m1*p + m2*p^2 ... + md*p^d

(where 0<=mi,ni < p)

then C(n,m) = C(n0,m0) * C(n1,m1) *... * C(nd, nd) mod p

see, eg http://www.cecm.sfu.ca/organics/papers/granville/paper/binomial/html/binomial.html

One way would be to use pascal's triangle to build a table of all C(m,n) for 0<=m<=n<=1009.

dmuir 2010-08-21 13:43:13

Hey! I said the same thing :)

Moron 2010-08-21 14:00:14

Ah, yes, that cuts straight through all my messing about with 1008! = -1. What if m0 > n0, though?

Steve Jessop 2010-08-21 14:02:31

Oops, I should have added, as moron did in his superior answer, that the convention is C(r,s) = 0 if r<s.

dmuir 2010-08-22 17:50:49

ansaurus

tags:

views:

answers:

Calculating Binomial Coefficient (nCk) for large n & k

related questions