views:

70

answers:

2

For a problem that I'm working on right now, I would like a reasonably uniform random choice from the powerset of a given set. Unfortunately this runs right into statistics which is something that I've not studied at all (something that I need to correct now that I'm getting into real programming) so I wanted to run my solution past some people that know it.

If the given set has size n, then there are (n k) = n!/[k!(n-k)!] subsets of size k and the total size N of the powerset is given as the sum of (n k) over k from 0 to n. (also given as 2n but I don't think that that's of use here. I could was obviously be wrong).

So my plan is to partition [0, 1] into the intervals:

 [0, (n 0)/N] 

 ((n 0)/N, [(n 0) + (n 1)]/N] 

 ([(n 0) + (n 1)]/N, [(n 0) + (n 1) + (n 2)]/N]

  ... 

 ([N - (n n)]/N, 1]

Algorithmically, the intervals are constructed by taking the greatest element of the previous interval for the greatest lower bound of the new interval adding (n j)/N to it to obtain the greatest element. I hope that's clear.

I can then figure out how many elements are in the random subset by choosing a uniform float in [0, 1] and mapping it to the index of the interval that it belongs to. From there, I can choose a random subset of the appropriate size.

  1. I'm pretty sure (from a merely intuitive perspective) that my scheme provides a uniform choice on the size of the subset (uniform relative to the total amount of subsets. It's plainly not uniform on the set {1, 2, .., n} of sizes).

  2. I'm using a library (python's random.sample) to get the subset of the given size so I'm confident that that will be uniform.

So my question is if putting the two together in the way I'm describing makes the choice of random subset of random size uniform. If the answer is a lot of work, then I'm happy to accept pointers as to how this might be proven and do the work for myself. Also, if there's a better way to do this, then I would of course be happy with that.

+3  A: 

I think you're going about this the long way. You were close when you mentioned the size of the power set as 2n. If you want to select a random element of the power set of a set of size n, generate a random integer in the range [0, 2n) and use the binary representation of the integer to select the appropriate element from the power set.

For example, suppose S = {a, b, c, d, e}. The power set then contains 25 = 32 elements. Generate a random number from 0 to 31, for example 18. The binary representation of 18 is 10010, so you would select the first and fourth elements of S. Your random element of the power set is then {a, d}.

Greg Hewgill
I was seriously over thinking that. Thanks.
aaronasterling
+2  A: 

Consider each element of the given set in turn, and decide with probability 1/2 to include it in the result set.

starblue
This is nice too and has the advantage that it would work without having to handle numbers larger than 64 bits in a language like C. Fortunately, Python and Lisp both do that for me. It _feels_ right but like I said, I don't know statistics so I couldn't prove it whereas Greg's answer is trivially uniform as it's just a mapping from a set of equal size that we already have a uniform choice on. Yet another way that I missed the 2^n connection. +1
aaronasterling
This is actually just as trivial to prove that it's uniform and in fact does the exact same thing as greg's method. Consider starblue's answer reworded. With equal probability pick 0 or 1 n times. This will create an integer of size n between 0 and 2^n, and each possibility is just as likely as another due to the fairness of coin flips. The biggest difference I can see between the is the number of bits representable, but also the performance of pseudo random number generators, with only one bit of randomness (this can be avoided though by doing 2*rand()/RAND_MAX).
Jacob Schlather