views:

124

answers:

6

I want to pre-compute some values for each combination in a set of combinations. For example, when choosing 3 numbers from 0 to 12, I'll compute some value for each one:

>>> for n in choose(range(13), 3):
    print n, foo(n)

(0, 1, 2) 78
(0, 1, 3) 4
(0, 1, 4) 64
(0, 1, 5) 33
(0, 1, 6) 20
(0, 1, 7) 64
(0, 1, 8) 13
(0, 1, 9) 24
(0, 1, 10) 85
(0, 1, 11) 13
etc...

I want to store these values in an array so that given the combination, I can compute its and get the value. For example:

>>> a = [78, 4, 64, 33]
>>> a[magic((0,1,2))]
78

What would magic be?

Initially I thought to just store it as a 3-d matrix of size 13 x 13 x 13, so I can easily index it that way. While this is fine for 13 choose 3, this would have way too much overhead for something like 13 choose 7.

I don't want to use a dict because eventually this code will be in C, and an array would be much more efficient anyway.

UPDATE: I also have a similar problem, but using combinations with repetitions, so any answers on how to get the rank of those would be much appreciated =).

UPDATE: To make it clear, I'm trying to conserve space. Each of these combinations actually indexes into something take up a lot of space, let's say 2 kilobytes. If I were to use a 13x13x13 array, that would be 4 megabytes, of which I only need 572 kilobytes using (13 choose 3) spots.

+4  A: 

You can try using the lexicographic index of the combination. Maybe this page will help: http://saliu.com/bbs/messages/348.html

This MSDN page has more details: Generating the mth Lexicographical Element of a Mathematical Combination.

To be a bit more specific:

When treated as a tuple, you can order the combinations lexicographically.

So (0,1,2) < (0,1,3) < (0,1,4) etc.

Say you had the number 0 to n-1 and chose k out of those.

Now if the first element is zero, you know that it is one among the first n-1 choose k-1.

If the first element is 1, then it is one among the next n-2 choose k-1.

This way you can recursively compute the exact position of the given combination in the lexicographic ordering and use that to map it to your number.

This works in reverse too and the MSDN page explains how to do that.

Moron
+1 I've never seen it explained as well as it is on the msdn page (I never would have thought to search for something like this there either). This way he could use the lexicographic index as an array index and practically get a perfect hash.
IVlad
@IVlad: Yes, I was surprised that to find that on MSDN!
Moron
Hmm it doesn't seem to work. e.g. (0, 1, 4) should have rank 2: (0,1,2),(0,1,3),(0,1,4), but doing (4 choose 3) + (1 choose 2) + (0 choose 1) gives 4..?
Claudiu
You don't add the numbers you get in the intermediate. If it is the smallest, you add a zero (i.e. do nothing). It might get tricky, though.
Moron
+2  A: 

Use a hash table to store the results. A decent hash function could be something like:

h(x) = (x1*p^(k - 1) + x2*p^(k - 2) + ... + xk*p^0) % pp

Where x1 ... xk are the numbers in your combination (for example (0, 1, 2) has x1 = 0, x2 = 1, x3 = 2) and p and pp are primes.

So you would store Hash[h(0, 1, 2)] = 78 and then you would retrieve it the same way.

Note: the hash table is just an array of size pp, not a dict.

IVlad
Could I get a reason for the downvote?
IVlad
I was wondering that myself. That's why the self-defence edit to my answer, which is obviously very similar to yours.
Steve314
No idea for the downvote. Seems reasonably fine, except that you probably need to find p >= n (pp could be smaller I suppose).
Moron
+2  A: 

I would suggest a specialised hash table. The hash for a combination should be the exclusive-or of the hashes for the values. Hashes for values are basically random bit-patterns.

You could code the table to cope with collisions, but it should be fairly easy to derive a minimal perfect hash scheme - one where no two three-item combinations give the same hash value, and where the hash-size and table-size are kept to a minimum.

This is basically Zobrist hashing - think of a "move" as adding or removing one item of the combination.

EDIT

The reason to use a hash table is that the lookup performance O(n) where n is the number of items in the combination (assuming no collisions). Calculating lexicographical indexes into the combinations is significantly slower, IIRC.

The downside is obviously the up-front work done to generate the table.

Steve314
I don't agree that lexicographic index generation will be _significantly_ slower than hash. If you have a lookup table of N choose K, finding the lexicographic index is O(k) too and could actually be faster, but who knows, till we measure :-) In fact, we probably don't even need the lookup table if we do it cleverly.
Moron
OK - I confess, I assumed calculating the rank was slower that it is. I should have checked first.
Steve314
@Steve314: You might actually be right, though.
Moron
@Moron: Not really - I didn't say it, but I was thinking the complexity would be something like O(nm) or O(n log m) where m is the number of items to choose from. I guess I could argue that the arithmetic ops are O(log m) - thinking of the needed bit-width for huge sets. Not exactly a *sane* argument, though.
Steve314
I've thought of a (slightly) sane justification for this answer. To calculate the lexicographic rank, you need your selection of items to be sorted. Otherwise, you'd need non-constant-time checks to account for the gaps in your range left by already-handled items. The hash method doesn't need this. The lexicographic rank is therefore `O(n log n)` (due to the sort) if the items aren't already sorted. Counting in Discworld troll, given that n is neither 1 nor 2 but many, then `n log n` must be lots. See - entirely sane ;-)
Steve314
@Steve314: Heh yep, that's true. I solved that particular issue in my hackish way by precomputing the index for any ordering of elements..
Claudiu
+2  A: 

Here is a conceptual answer and a code based on how lex ordering works. (So I guess my answer is like that of "moron", except that I think that he has too few details and his links have too many.) I wrote a function unchoose(n,S) for you that works assuming that S is an ordered list subset of range(n). The idea: Either S contains 0 or it does not. If it does, remove 0 and compute the index for the remaining subset. If it does not, then it comes after the binomial(n-1,k-1) subsets that do contain 0.

def binomial(n,k):
    if n < 0 or k < 0 or k > n: return 0
    b = 1
    for i in xrange(k): b = b*(n-i)/(i+1)
    return b

def unchoose(n,S):
    k = len(S)
    if k == 0 or k == n: return 0
    j = S[0]
    if k == 1: return j
    S = [x-1 for x in S]
    if not j: return unchoose(n-1,S[1:])
    return binomial(n-1,k-1)+unchoose(n-1,S)

def choose(X,k):
    n = len(X)
    if k < 0 or k > n: return []
    if not k: return [[]]
    if k == n: return [X]
    return [X[:1] + S for S in choose(X[1:],k-1)] + choose(X[1:],k)

(n,k) = (13,3)
for S in choose(range(n),k): print unchoose(n,S),S

Now, it is also true that you can cache or hash values of both functions, binomial and unchoose. And what's nice about this is that you can compromise between precomputing everything and precomputing nothing. For instance you can precompute only for len(S) <= 3.

You can also optimize unchoose so that it adds the binomial coefficients with a loop if S[0] > 0, instead of decrementing and using tail recursion.

Greg Kuperberg
ah awesome, makes a lot of sense! would you happen to know a solution for combinations with repetitions? e.g. (0,0,0),(0,0,1),(0,0,2),...,(0,1,1),(0,1,2), etc...
Claudiu
Combinations with repetitions are an equivalent problem. First, you have the formula multibinomial(n,k) = binomial(n+k-1,k). Second, you can divide the combinations into two kinds, those that use 0 and come first, and those that do not use 0 and come after the multibinomial(n,k-1) combinations that do. The code would be very similar and I won't post it. (In fact, there is a standard bijection, called "stars and bars", between (n,k) combinations with repetitions and (n+k-1,k) combinations without repetitions. It preserves lex ordering.)
Greg Kuperberg
@Greg: I think I can figure it out from there - thanks for the clear answer! You explained this in 8 lines of code and a few sentences much better than that entire article.
Claudiu
Thank you for saying so and you're very welcome.
Greg Kuperberg
A: 

For now, I've reached a compromise: I have a 13x13x13 array which just maps to the index of the combination, taking up 13x13x13x2 bytes = 4 kilobytes (using short ints), plus the normal-sized (13 choose 3) * 2 kilobytes = 572 kilobytes, for a total of 576 kilobytes. Much better than 4 megabytes, and also faster than a rank calculation!

I did this partly cause I couldn't seem to get Moron's answer to work. Also this is more extensible - I have a case where I need combinations with repetitions, and I haven't found a way to compute the rank of those, yet.

Claudiu
A: 

What you want are called combinadics. Here's my implementation of this concept, in Python:

def nthresh(k, idx):
  """Finds the largest value m such that C(m, k) <= idx."""
  mk = k
  while ncombs(mk, k) <= idx:
    mk += 1
  return mk - 1


def idx_to_set(k, idx):
  ret = []
  for i in range(k, 0, -1):
    element = nthresh(i, idx)
    ret.append(element)
    idx -= ncombs(element, i)
  return ret


def set_to_idx(input):
  ret = 0
  for k, ck in enumerate(sorted(input)):
    ret += ncombs(ck, k + 1)
  return ret
Nick Johnson