For a problem that I'm working on right now, I would like a reasonably uniform random choice from the powerset of a given set. Unfortunately this runs right into statistics which is something that I've not studied at all (something that I need to correct now that I'm getting into real programming) so I wanted to run my solution past some people that know it.
If the given set has size n, then there are (n k) = n!/[k!(n-k)!] subsets of size k and the total size N of the powerset is given as the sum of (n k) over k from 0 to n. (also given as 2n but I don't think that that's of use here. I could was obviously be wrong).
So my plan is to partition [0, 1] into the intervals:
[0, (n 0)/N]
((n 0)/N, [(n 0) + (n 1)]/N]
([(n 0) + (n 1)]/N, [(n 0) + (n 1) + (n 2)]/N]
...
([N - (n n)]/N, 1]
Algorithmically, the intervals are constructed by taking the greatest element of the previous interval for the greatest lower bound of the new interval adding (n j)/N to it to obtain the greatest element. I hope that's clear.
I can then figure out how many elements are in the random subset by choosing a uniform float in [0, 1] and mapping it to the index of the interval that it belongs to. From there, I can choose a random subset of the appropriate size.
I'm pretty sure (from a merely intuitive perspective) that my scheme provides a uniform choice on the size of the subset (uniform relative to the total amount of subsets. It's plainly not uniform on the set {1, 2, .., n} of sizes).
I'm using a library (python's
random.sample
) to get the subset of the given size so I'm confident that that will be uniform.
So my question is if putting the two together in the way I'm describing makes the choice of random subset of random size uniform. If the answer is a lot of work, then I'm happy to accept pointers as to how this might be proven and do the work for myself. Also, if there's a better way to do this, then I would of course be happy with that.