views:

459

answers:

6

The number of combinations of k items which can be retrieved from N items is described by the following formula.

             N! 
c =  ___________________ 
       (k! * (N - k)!)

An example would be how many combinations of 6 Balls can be drawn from a drum of 48 Balls in a lottery draw.

Optimize this formula to run with the smallest O time complexity

This question was inspired by the new WolframAlpha math engine and the fact that it can calculate extremely large combinations very quickly. e.g. and a subsequent discussion on the topic on another forum.

http://www97.wolframalpha.com/input/?i=20000000+Choose+15000000

I'll post some info/links from that discussion after some people take a stab at the solution.

Any language is acceptable.

+2  A: 

I'd solve it in Mathematica:

Binomial[n, k]

Man, that was easy...

Pesto
WINRAR!!~!!1!!!!
Geoffrey Chetwood
And how does Mathematica calculate factorials?
Matt Kane
@Matt Kane: Very, very well.
Welbog
@Matt Kane: With magic.
Geoffrey Chetwood
@Matt Kane: Probably exactly the same way WolframAlpha does. It's a Wolfram product, too.
Pesto
@Rich B: I don't think WinRAR has that kind of functionality.
Sean Nyman
Well, what time complexity is it? That's what the question is looking for.
Matt Kane
@Matt Kane: It's one line of code, so it's obviously O(1). Duh.
Pesto
You are not being serious enough. This is the Internet. There is no joking on the Internet.
Matt Kane
@Matt Kane: I do not think you understand what code golf is.
Geoffrey Chetwood
I understand what code golf is, but in the problem is this bit : "Optimize this formula to run with the smallest O time complexity"If you don't know the complexity you haven't really made an attempt.
Matt Kane
+3  A: 
MizardX
Very pythonic, I was about to post more or less the same thing :P
Andrea Ambu
This will choke and die hilariously on 20000000 Choose 15000000. Also, it is only O(min[k,n-k]) if you indulge in the fiction that operations on huge integers can be performed in constant time.
PeterAllenWebb
My previous comment sounds dismissive to me on re-reading. Apologies if you took offense. But yes, the result for 20000000 Choose 15000000 has almost 5 million digits. The multiplicands in the later stages won't even fit into L2 cache on most processors, so things are bound to be very slow. What facilities does python have for extremely large floating point numbers? Does anyone know?
PeterAllenWebb
If both operands to the multiplication is larger than 1050 bits (70 * 15 bits/digit), then python will use Karatsuba's multiplication algorithm. It's time complexity is only O(n^1.584), compared to the "gradeshool" algorithm O(n^2). On the other operations, there are similar optimizations.
MizardX
+1  A: 

Given a reasonable number of values for n and K, calculate them in advance and use a lookup table.

It's dodging the issue in some fashion (you're offloading the calculation), but it's a useful technique if you're having to determine large numbers of values.

Brian Agnew
I assume you mean calculate factorials in a lookup and divide the nth entry by the kth and (n-k)th entry. Otherwise you use a two dimensional lookup table (and would still need to specify an algorithm to avoid the horrifying divisions required in the 1-dimensional approach).
David Berger
I hadn't given it much thought at all, beyond making use of a lookup table for some/all of the above. Obviously you have penalties in terms of memory and the initial calculation (would you need *all* permutations ?), hence I hedged myself to some small degree by saying for 'reasonable' numbers of values, and deliberately not defining that!
Brian Agnew
I think memoizing factorials would be better if you need to do it for much data.
Andrea Ambu
+4  A: 

Notice that WolframAlpha returns a "Decimal Approximation". If you don't need absolute precision, you could do the same thing by calculating the factorials with Stirling's Approximation.

Now, Stirling's approximation requires the evaluation of (n/e)^n, where e is the base of the natural logarithm, which will be by far the slowest operation. But this can be done using the techniques outlined in another stackoverflow post.

If you use double precision and repeated squaring to accomplish the exponentiation, the operations will be:

  • 3 evaluations of a Stirling approximation, each requiring O(log n) multiplications and one square root evaluation.
  • 2 multiplications
  • 1 divisions

The number of operations could probably be reduced with a bit of cleverness, but the total time complexity is going to be O(log n) with this approach. Pretty manageable.

EDIT: There's also bound to be a lot of academic literature on this topic, given how common this calculation is. A good university library could help you track it down.

EDIT2: Also, as pointed out in another response, the values will easily overflow a double, so a floating point type with very extended precision will need to be used for even moderately large values of k and n.

PeterAllenWebb
+1  A: 

MATLAB:

  • The cheater's way (using the built-in function NCHOOSEK): 13 characters, O(?)

    nchoosek(N,k)
    
  • My solution: 36 characters, O(min(k,N-k))

    a=min(k,N-k);
    prod(N-a+1:N)/prod(1:a)
    
gnovice
+2  A: 

Python: approximation in O(1) ?

Using python decimal implementation to calculate an approximation. Since it does not use any external loop, and the numbers are limited in size, I think it will execute in O(1).

from decimal import Decimal

ln = lambda z: z.ln()
exp = lambda z: z.exp()
sinh = lambda z: (exp(z) - exp(-z))/2
sqrt = lambda z: z.sqrt()

pi = Decimal('3.1415926535897932384626433832795')
e = Decimal('2.7182818284590452353602874713527')

# Stirling's approximation of the gamma-funciton.
# Simplification by Robert H. Windschitl.
# Source: http://en.wikipedia.org/wiki/Stirling%27s_approximation
gamma = lambda z: sqrt(2*pi/z) * (z/e*sqrt(z*sinh(1/z)+1/(810*z**6)))**z

def choose(n, k):
  n = Decimal(str(n))
  k = Decimal(str(k))
  return gamma(n+1)/gamma(k+1)/gamma(n-k+1)

Example:

>>> choose(20000000,15000000)
Decimal('2.087655025913799812289651991E+4884377')
>>> choose(130202807,65101404)
Decimal('1.867575060806365854276707374E+39194946')

Any higher, and it will overflow. The exponent seems to be limited to 40000000.

MizardX