ansaurus

Question

Generate a random binary number with a variable proportion of '1' bits

Answer 1

+1 A:

Use a random generator that generates a uniform float number r between 0 and 1. If r>p then set the bit to 0, otherwise set it to 1

GregS 2010-01-16 02:01:18

That's what I'm trying to avoid. It's too slow for my application.

finnw 2010-01-16 02:03:23

ahh, I see what you are getting at. I'll think more about it.

GregS 2010-01-16 02:17:47

You aren't going to get orders of magnitude performance improvement by calculating batches of bits together.

Tom Hawtin - tackline 2010-01-16 02:17:50

@Tom, actually I do get an order-of-magnitude improvement, at least in the case of p=0.25, because I am generating 4 random words for each output word (assuming the RNG output is 32 bits wide.) If I generate one bit at a time I must generate 64 random words, increasing the cost by a factor of 16.

finnw 2010-01-16 02:39:45

Answer 2

+1 A:

If you're looking to apply some distribution where with probability P you get a 1 and with probability 1-P you get a 0 at any particular bit your best bet is simply to generate each bit independently with probability P of being a 1 (that sounds like a recursive definition, I know).

Here's a solution, I'll walk through it below:

public class MyRandomBitGenerator
{

    Random pgen = new Random();

    // assumed p is well conditioned (0 < p < 1)
    public boolean nextBitIsOne(double p){
        return pgen.nextDouble() < p ? true : false;
    }

    // assumed p is well conditioned (0 < p < 1)
    public long nextLong(double p){
        long nxt = 0;
        for(int i = 0; i < 64; i++){
           if(nextBitIsOne(p)){
               nxt += 1 << i;
           }
        }
        return nxt;
    }

}

Basically, we first determine how to generate a value of 1 with probability P: pgen.nextDouble() generates a number between 0 and 1 with uniform probability, by asking if it's less than p we're sampling this distribution such that we expect to see p 1s as we call this function infinitely.

Mark E 2010-01-16 02:11:50

My original implementation looks very similar to this, and is very slow. That's why I decided to try bit manipulation to parallelize the bits within a word. If P is a round number like 0.25, 0.5 or 0.75 it gives a huge performance boost. I'm not sure if this is true for other values of P though.

finnw 2010-01-16 02:48:34

Answer 3

+3 A:

First a little ugly math that you're already using in your code.

Define x and y are bits with probability of being 1 of X = p(x=1), Y = p(y=1) respectively. Then we have that

 p( x & y = 1) = X Y
 p( x | y = 1) = 1 - (1-X) (1-Y)
 p( x ^ y = 1) = X (1 - Y) + Y (1 - X)

Now if we let Y = 1/2 we get

P( x & y ) = X/2
P( x | y ) = (X+1)/2

Now set the RHS to the probability we want and we have two cases that we can solve for X

X = 2 p        // if we use &
X = 2 p - 1    // if we use |

Next we assume we can use this again to obtain X in terms of another variable Z... And then we keep iterating until we've done "enough".

Thats a bit unclear but consider p = 0.375

0.375 * 2 = 0.75  < 1.0 so our first operation is &
0.75 * 2 = 1.5 > 1.0 so our second operation is |
0.5 is something we know so we stop.

Thus we can get a variable with p=0.375 by X1 & (X2 | X3 )

The problem is that for most variables this will not terminate. e.g.

0.333 *2 = 0.666 < 1.0 so our first operation is &
0.666 *2 = 1.333 > 1.0 so our second operation is |
0.333 *2 = 0.666 < 1.0 so our third operation is &
etc...

so p=0.333 can be generated by

X1 & ( X2 | (X3 & (X4 | ( ... ) ) ) )

Now I suspect that taking enough terms in the series will give you enough accuracy, and this can be written as a recursive function. However there might be a better way that that too... I think the order of the operations is related to the binary representation of p, I'm just not sure exactly how... and dont have time to think about it deeper.

Anyway heres some untested C++ code that does this. You should be able to javaify it easily.

uint bitsWithProbability( float p )
{
   return bitsWithProbabilityHelper( p, 0.001, 0, 10 );
}

uint bitsWithProbabilityHelper( float p, float tol, int cur_depth, int max_depth )
{
   uint X = randbits();
   if( cur_depth >= max_depth) return X;
   if( p<0.5-tol)
   {
     return X & bitsWithProbabilityHelper( 2*p, 0.001, cur_depth+1, max_depth );
   }
   if(p>0.5+tol)
   {
     return X | bitsWithProbabilityHelper( 2*p-1, 0.001, cur_depth+1, max_depth );
   }
   return X;
}

Michael Anderson 2010-01-16 02:59:13

You can probably adjust tol at easch step and remove max_depth. Again just requires a bit more spare brains than I currently have.

Michael Anderson 2010-01-16 03:02:33

+1 I had worked about the same thing and was about to test whether it converges

Wim Coenen 2010-01-16 03:10:29

Yes I think `cur_depth` is redundant. Multiplying `tol` by a constant factor like `2.001` will have a similar effect.

finnw 2010-01-17 03:31:34

@finw Indeed. Infact the error in the probability at truncating the series after N terms is at most pow( 2, -N ). So scaling tol by 2 every iteration is sensible / correct.

Michael Anderson 2010-01-17 08:10:58

I figured out a way to do it iteratively using the binary representation of P. See my second answer.

finnw 2010-01-17 23:34:37

Yep figured something similar out also. Will add comments to your solution.

Michael Anderson 2010-01-18 00:13:20

Answer 4

+2 A:

Distribute proportional number of bits throughuot the number. Pseudocode:

long generateNumber( double probability ){
  int bitCount = 64 * probability;
  byte[] data = new byte[64]; // 0-filled

  long indexes = getRandomLong();

  for 0 to bitCount-1 {
    do { 
      // distribute this bit to some postition with 0.
      int index = indexes & 64;
      indexes >> 6;
      if( indexes == 0 ) indexes = getRandomLong();
    } while ( data[index] == 0 );
    data[index] = 1;
  }

  return bytesToLong( data );
}

I hope you get what I mean. Perhaps the byte[] could be replaced with a long and bit operations to make it faster.

Ondra Žižka 2010-01-16 03:01:37

I hope I understood the question - do you need to guarantee the number of bits being 1, i.e. some bits permutation?

Ondra Žižka 2010-01-16 03:11:33

Interesting, but doesn't meet the requirement that bits must be independent of each other. This method will always return exactly (64*P) ones in the result. So for example P=1/16 exactly 4 bits are set, so if bits 0-3 are set then all other bits must be clear.

finnw 2010-01-16 03:11:44

However if this was combined with a binomial-distribution generator to choose `bitCount` then it could give the right result.

finnw 2010-01-16 03:12:51

I have now implemented this as a combination of (a) binomial-distribution generator for `bitCount` and (b) a lookup table of permutations of bits.

finnw 2010-01-18 01:37:48

Answer 5

A:

Here's how I solved it in the end.

Generate an integer N between 0..16, following the binomial distribution. This gives the number of '1' bits in the 16-bit partial result.
Randomly generate an index into a lookup table that contains 16-bit integers containing the desired number of '1' bits.
Repeat 4 times to get four 16-bit integers.
Splice these four 16-bit integers together to get a 64-bit integer.

This was partly inspired by Ondra Žižka's answer.

The benefit is that it reduces the number of calls to Random.nextLong() to 8 calls per 64 bits of output. For comparison, rolling for each individual bit would require 64 calls. Bitwise AND/OR uses between 2 and 32 calls depending on the value of P

Of course calculating binomial probabilities is just as expensive, so those go in another lookup table.

It's a lot of code, but it's paying off in terms of performance.

The code

Update - merged this with the bitwise AND/OR solution. It now uses that method if it guesses it will be more efficient (in terms of calls to Random.next().)

finnw 2010-01-16 16:46:32

Answer 6

+1 A:

Here's another variant of Michael Anderson's answer

To avoid recursion, we process the bits of P iteratively from right-to-left instead of recursizely from left-to-right. This would be tricky in floating-point representation so we extract the exponent/mantissa fields from the binary representation instead.

class BitsWithProbabilityHelper {
    public BitsWithProbabilityHelper(float prob, Random rnd) {
        this.rnd = rnd;

        if (prob <= 0f) {
            zero = true;
            return;
        }

        // Decode IEEE float
        int probBits = Float.floatToIntBits(prob);
        mantissa = probBits & 0x7FFFFF;
        exponent = probBits >>> 23;
        if (exponent > 0) mantissa |= 0x800000;
        exponent -= 150;

        // Convert prob to the form m * 2**e
        int ntz = Integer.numberOfTrailingZeros(mantissa);
        mantissa >>= ntz;
        exponent += ntz;
    }

    /** Determine how many random words we need from the system RNG to
     *  generate one output word with probability P.
     **/
    public int iterationCount() {
        return - exponent;
    }

    /** Generate a random number with the desired probability */
    public long nextLong() {
        if (zero) return 0L;

        long acc = -1L;
        int shiftReg = mantissa - 1;
        for (int bit = exponent; bit < 0; ++ bit) {
            if ((shiftReg & 1) == 0) {
                acc &= rnd.nextLong();
            } else {
                acc |= rnd.nextLong();
            }
            shiftReg >>= 1;
        }
        return acc;
    }

    /** Value of <code>prob</code>, represented as m * 2**e */
    private int exponent;  
    private int mantissa;

    /** Random data source */
    private final Random rnd;

    /** Zero flag (special case) */
    private boolean zero;
}

finnw 2010-01-17 23:28:47

This looks good. Only disadvantage to my method is that you can't specify a tolerance in the probability to limit the number of random numbers generated. However it would be easy to implement a clipping of the bits in the mantissa that would give a better result than using tol for guaranteed reduced running time.

Michael Anderson 2010-01-18 00:18:37

Actually there's something wrong with that inner loop for very small p. I think -exponent can be bigger than the length of the shiftReg, and in those cases you do more work than is necessary. I'm also concerned that it might be doing the wrong thing for denomralised numbers. Maybe you could do a long binrep = p * 0xFFFFFFFF and loop through the bits in binrep rather than playing with the bits of a float?. Then you'd have a fixed length loop and known max error in the approximation.

Michael Anderson 2010-01-18 00:31:29

You can apply a tolerance by calling `iterationCount()` first and if the answer is too high use a different method. The smallest non-zero value for a single-precision float is 2**-149, so the worst-case iteration count would be 149. Denormalised numbers are handled in line 9 (`mantissa |= 0x800000` restores the implicit 1 bit of normal numbers, this is skipped for denormalised numbers.) If -exponent is huge then unfortunately you do need to loop that many times in order to get the probability down, even though the shiftReg will be zero for most iterations.

finnw 2010-01-18 01:10:58

It wasn't handling zero correctly though. I've added a special case for P=0.

finnw 2010-01-18 01:14:51

ansaurus

tags:

views:

answers:

Generate a random binary number with a variable proportion of '1' bits

related questions