views:

1049

answers:

6

Hello,

I have a problem resembling the one described here:

http://stackoverflow.com/questions/127704/algorithm-to-return-all-combinations-of-k-elements-from-n

I am looking for something similar that covers all possible combinations of k from n. However, I need a subset to vary a lot from the one drawn previously. For example, if I were to draw a subset of 3 elements from a set of 8, the following algorithm wouldn't be useful to me since every subset is very similar to the one previously drawn:

11100000, 11010000, 10110000, 01110000, ...

I am looking for an algorithm thats picks the subsets in a more "random" looking fashion, ie. where the majority of elements in one subset is not reused in the next:

11100000, 00010011, 00101100, ...

Does anyone know of such an algorithm?

I hope my question made sence and that someone can help me out =)

Kind regards,

Christian

A: 

How about randomly choosing the k elements. ie choose the pth where p is random between 1 and n, then reorder what's left and choose the qth where q is between 1 and n-1 etc?

or maybe i misunderstood. do you still want all possibilities? in that case you can always generate them first and then choose random entries from your list

As I also said to AnnaR, I am actually trying to speed up a process that currently uses RNG. Also k and n are to big for all combinations to be precalculated and memorized. I therefore thought that maybe a combination generator that calculated the subsets on the run could solve the problem. However, there needs to be substancial variation in each subset. Such an algorithm might not exist but I thought I might as well give it a try and ask here =)
+1  A: 

How about first generating all possible combinations of k from n, and then rearranging them with help of a random function.

If you have the result in a vector, loop through the vector: for each element let it change place with the element at a random position.

This of course becomes slow for large k and n.

AnnaR
The thing is, I am working with very large k and n, so large that generating and memorizing all possible combinations of k from n is not an option. The subsets therefore need to be calculated on the run.The algorithm I am working with right now extracts the subsets randomly using a RNG. This works, but generating pseudo-random numbers takes a bit of calculation + there is a small chance of picking the same element twice. I thought maybe I could speed up the process by using a combination generator instead, but as I stated above, there needs to be great variation from subset to subset.
A: 
bubaker
Lexicographically distant is what I mean, yes. And this applies to all previous combinations. I know that picking more combinations reduces the largest possible lexicographic distance for the next subset. And I don't require the subset to be as lexicographically distant from the previous subsets as possible. The distance should be more than minimal though. I am beginning to think that this is impossible to do without using a RNG, which is what I am trying to get away from. Furthermore, as I already stated, my n and k are too large for storing all combinations.
+1  A: 

This is not really random, but depending on your needs it might suit you.

  1. Calculate the number of possible combinations. Let's name them N.
  2. Calculate a large number which is coprime to N. Let's name it P.
  3. Order the combinations and give them numbers from 1 to N. Let's name them C1 to CN
  4. Iterate for output combinations. The first one will be VP mod N, the second one will be C2*P mod N, the third one C3*P mod N, etc. In essence, Outputi = Ci*P mod N. Mod is meant as the modulus operator.

If P is picked carefully, you will get seemingly random combinations. Values close to 1 or to N will produce values that differ little. Better pick values close to, say N/4 or N/5. You can also randomize the generation of P for every iteration that you need.

Vilx-
This would work if I didn't have to precalculate and store all possible subsets. But I'll try to see if I can find a combination generator that lets me calculate the n'th subset without having to calculate all the n-1 subsets preceding it. Thank you for the help =)
You beat me to it. By the way, step 3 is implicit in my opinion: using a running counter does exactly that.
RaphaelSP
@Christian: you don't actually have to generate the n-1 combinations to obtain the nth. Look for the unranking function for the minimal-change algorithm. All I could find for now is that: http://www.jjj.de/fxt/#fxt. I'll post something more concise this evening, when I find my implementation.
RaphaelSP
Thank you, I'll look at it.
Yes, I originally didn't have Step 3 either, but then I realized that I needed to give names to the combinations to be able to reference them later.
Vilx-
A: 

As a follow-up to my comment on this answer, here is some code that allows one to determine the composition of a subset from its "index", in colex order. Shamelessly stolen from my own assignments.

//////////////////////////////////////
// NChooseK
//
// computes    n!
//          --------
//          k!(n-k)!
//
// using Pascal's identity
// i.e. (n,k) = (n-1,k-1) + (n-1,k)
//
// easily optimizable by memoization
long long NChooseK(int n, int k)
{
    if(k >= 0 && k <= n && n >= 1)
    {
        if( k > n / 2)
            k = n - k;
        if(k == 0 || n == 0)
            return 1;
        else
            return NChooseK(n-1, k-1) + NChooseK(n-1, k);
    } 
    else
        return 0;
}

///////////////////////////////////////////////////////////////////////
// SubsetColexUnrank
//  The unranking works by finding each element
//  in turn, beginning with the biggest, leftmost one.
//  We just have to find, for each element, how many subsets there are
//  before the one beginning with the elements we have already found.
//
// It stores its results (indices of the elements present in the subset) into T, in ascending order.
void SubsetColexUnrank(long long r, int * T, int subsetSize)
{
    assert( subsetSize >= 1 );

    // For each element in the k-subset to be found
    for(int i = subsetSize; i >= 1; i--)
    {
        // T[i] cannot be less than i
        int x = i;

        // Find the smallest element such that, of all the k-subsets that contain it,
        // none has a rank that exceeds r.            
        while( NChooseK(x, i) <= r )
            x++;

        // update T with the newly found element
        T[i] = x;

        // if the subset we have to find is not the first one containing this element
        if(r > 0)
        {
            // finding the next element of our k-subset
            // is like finding the first one of the same subset
            // divided by {T[i]}
            r -= NChooseK(x - 1, i);
        }
    }
}

Random-in, random-out.

The colex order is such that its unranking function does not need the size of the set from which to pick the elements to work; the number of elements is assumed to be NChooseK(size of the set, size of the subset).

RaphaelSP
A: 

Depending on what you are trying to do, you could do something like playing cards. Keep two lists: Source is your source (unused) list; and Used the second is the "already-picked" list. As you randomly pick k items from Source, you move them to your Used list.

If there are k items left in Source when you need to pick again, you pick them all and swap the lists. If there are fewer than k items, you pick j items from Used and add them to Source to make k items in Source, then pick them all and swap the lists.

This is kind of like picking k cards from a deck. You discard them to the used pile. Once you reach the end or need more cards, you shuffle the old ones back into play.

This is just to make sure each set is definitely different from the previous subsets. Also, this will not really guarantee that all possible subsets are picked before old ones start being repeated.

The good is that you don't need to worry about pre-calculating all the subsets, and your memory requirements are linear with your data (2 n-sized lists).

Erich Mirabal