views:

740

answers:

5

Some background: I'm writing a more or less brute force search algorithm for solving a problem that I have. In order to do this, I need to generate and evaluate all possibilities to find out which is best. Since the evaluation actually takes some time I would prefer to generate as little as possible solutions that completely cover my search space. Furthermore, the more elements I can do this for, the better. For any number K there are normally K! permutations, and generating them all will be hard for numbers higher than ~10.

Real problem: The search space should contain all permutations of two elements (N times el1 and M times el2, where K=M+N), with these restrictions:

  1. they have to be unique (i.e. I only want [a a b b b] once)
  2. I don't need the reverse of any permutation (i.e. if I have [a a b], I don't also need [b a a])
  3. I consider the permutations to be circular, so [a a b] = [a b a] = [b a a]

If I would be able to do this, the number of possibilities would be decreased drastically. Since K will ideally be large, it is not feasible to first generate all permutations and then filter them according to these criteria. I have already done the first restriction (see below) and it cut back the number from 2^K for Matlab's normal permutations function (perms) to K!/N!M!, which is a huge win. The second restriction will only cut the number of possiblities in half (in the best case), but I think the third should also be able to really cut down the number of possibilities.

If anyone knows how to do it, and preferably also how to calculate how many possibilities there will be, that would help me a lot! I would prefer an explanation, but code is also fine (I can read C-like languages, Java(Script), Python, Ruby, Lisp/Scheme).


For the interested: Here is the algorithm for getting only unique permutations that I have so far:

function genPossibilities(n, m, e1, e2)
     if n == 0
         return array of m e2's
     else
         possibilities = genPossibilities(n-1, m, e1, e2)
         for every possibility:
             gain = number of new possibilities we'll get for this smaller possibility*
             for i in max(0,(m+n-gain))
                 if possibility(i) is not e1
                     add possiblity with e1 inserted in position i
         return new possibilities
  • If you have all permutations for N-1 and M, then you can use them to find the permutations for N and M by inserting e1 into them. You can't just insert everywhere though, because then you'll get duplicates. I don't know why this works, but you can calculate the number of new possibilities that you'll generate from an old one (I call this 'gain'). This number starts at M+1 for the first old permutation and decreases by one for each old permutation until it would become zero, at which point it goes back to M, etc. (only works if M>=N). So if you want to calculate the permutations for N=3 and M=3 and you have the 10 permutations for N=2 and M=3, their gains will be [4 3 2 1 3 2 1 2 1 1]. Subtract this gain from the length of the permutation and you get the index at which you can start inserting new elements without making duplicates.
A: 

If you have only two elements, your space is much smaller: 2^k rather than k!.

Try an approach like this:

  1. Run through all numbers from 1 through 2^k.
  2. Write the number in binary form.
  3. Translate all 0s to a and 1s to b. Now you have a permutation.
  4. Take your sequence and generate 2k sequences by cyclic permutations and reversal. You only need to evaluate 1 of these 2k sequences.
  5. Among the 2k sequences, choose the one that sorts first alphabetically.
  6. Check your log to see if you've already done this one. If so, skip.
  7. If this one is new, evaluate it, and add to the "done" log. (If space permits, you could add all 2k elements of the "family" to the done log, so you could move step (6) to right after step (3). You could also store the number, rather than the sequence of a's and b's, in the "done" log.)

If you have j possible symbols, rather than just two, do the same thing but use base j rather than base 2.

Mike Kantor
He wanted k-tuples of 2 elements, not pairs of k elements. So 2^k is correct.
job
Thanks for your reply. Actually, steps 1-3 were my first try, but they will generate all 2^k possibilities. The rest doesn't seem to be very efficient either. I still have to do something for all of the 2^k possibilities and there is significant work added (for every one I need to mirror and shift it a couple of times, then sort the result and compare it to some big log that I have to keep).
Jordi
A: 

You are looking for combinations - which are order independent. Matlab calculated this correctly with K!/N!M! which is precisely the formula for calculating the number of combinations.

Michael Donohue
But the OP does care about order - [ a b a b b b b ] is different to [a b b a b b b ].
caf
Ah. The 'circular' example needs to use more than three elements to make a point then.
Michael Donohue
No, I already know the combination, because I know how much of each element I want. I am interested in the order, it is just that a lot of orders look the same to me. For M=3 and N=2 there are 10 unique combinations, I only want the *'s:1 aabbb* <- unique2 ababb* <- unique3 abbab <- reverse+shift right to get 24 abbba <- shift right to get 15 baabb <- shift left to get 16 babab <- shift left to get 27 babba <- shift right to get 28 bbaab <- shift 2*left to get 19 bbaba <- reverse to get 20 bbbaa <- reverse to get 1
Jordi
Restriction #3 does not adequately demonstrate this. I'd suggest showing an example with k=4 and an example of a non-equivalent sequence
Michael Donohue
A: 

Assuming you have an array of all permutations, you can put the contents of the array into a hash. Then this will work (a little brute force, but its a start):

for each (element in array of permutations){
  if (element exists in hash){
    remove each circular permutation of element in hash except for element itself
  }
}
+1  A: 

I think you want to generate 2-ary free necklaces. See this question for link, papers, and some code.

Jay Kominek
That question appears to be focussed on fixed necklaces rather than free necklaces / bracelets (it could also be true that the special case of k=2 might lead to some significant optimisations being possible).
caf
Item #2 on the list seems to suggest bracelets. Only wanting the original or the reverse (not both) means treating them as equivalent. (Right? Am I misreading #2?)
Jay Kominek
Sorry I wasn't clear - you're right that this question is about bracelets, but what I meant was that the question you linked to is about fixed necklaces.
caf
A: 

What you are after is a subset of 2-ary bracelets (the subset is defined by exactly n of character A and m of character B). The set of all bracelets allows for the number of A's and B's to vary.

The following code prints out the sequences you are after, and does so in lexical order and in constant amortised time. It is based on the general algorithm in this paper by Sawada - for an explanation of how it works, see that paper.

#include <stdlib.h>
#include <stdio.h>

static int *a;
static int n;

void print_bracelet(int n, int a[])
{
    int i;

    printf("[");
    for (i = 0; i < n; i++)
        printf(" %c", 'a' + a[i]);
    printf(" ]\n");
}

int check_rev(int t, int i)
{
    int j;

    for (j = i+1; j <= (t + 1)/2; j++)
    {
        if (a[j] < a[t-j+1])
            return 0;
        if (a[j] > a[t-j+1])
            return -1;
    }

    return 1;
}

void gen_bracelets(int n_a, int n_b, int t, int p, int r, int u, int v, int rs)
{
    if (2 * (t - 1) > (n + r))
    {
        if (a[t-1] > a[n-t+2+r])
            rs = 0;
        else if (a[t-1] < a[n-t+2+r])
            rs = 1;
    }
    if (t > n)
    {
        if (!rs && (n % p) == 0)
            print_bracelet(n, a + 1);
    }
    else
    {
        int n_a2 = n_a;
        int n_b2 = n_b;

        a[t] = a[t-p];

        if (a[t] == 0)
            n_a2--;
        else
            n_b2--;

        if (a[t] == a[1])
            v++;
        else
            v = 0;

        if ((u == (t - 1)) && (a[t-1] == a[1]))
            u++;

        if ((n_a2 >= 0) && (n_b2 >= 0) && !((t == n) && (u != n) && (a[n] == a[1])))
        {
            if (u == v) {
                int rev = check_rev(t, u);

                if (rev == 0)
                    gen_bracelets(n_a2, n_b2, t + 1, p, r, u, v, rs);

                if (rev == 1)
                    gen_bracelets(n_a2, n_b2, t + 1, p, t, u, v, 0);
            }
            else
                gen_bracelets(n_a2, n_b2, t + 1, p, r, u, v, rs);
        }

        if (u == t)
            u--;

        if (a[t-p] == 0 && n_b > 0)
        {
            a[t] = 1;

            if (t == 1)
                gen_bracelets(n_a, n_b - 1, t + 1, t, 1, 1, 1, rs);
            else
                gen_bracelets(n_a, n_b - 1, t + 1, t, r, u, 0, rs);
        }
    }
}

int main(int argc, char *argv[])
{
    int n_a, n_b;

    if (argc < 3)
    {
        fprintf(stderr, "Usage: %s <a> <b>\n", argv[0]);
        return -2;
    }

    n_a = atoi(argv[1]);
    n_b = atoi(argv[2]);

    if (n_a < 0 || n_b < 0)
    {
        fprintf(stderr, "a and b must be nonnegative\n");
        return -3;
    }

    n = n_a + n_b;
    a = malloc((n + 1) * sizeof(int));

    if (!a)
    {
        fprintf(stderr, "could not allocate array\n");
        return -1;
    }

    a[0] = 0;

    gen_bracelets(n_a, n_b, 1, 1, 0, 0, 0, 0);

    free(a);
    return 0;
}
caf