tags:

views:

133

answers:

2

Hi,

I am trying to generate random floats using nothing but bytes I get from /dev/urandom. Currently my best idea is to get the platform precision doing something like:

$maximumPrecision = strlen('' . 1/3) - 2;

and then construct a string of 0-9 in a loop the number of times $maximumPrecision tells us. For examle, if the precision is 12, I would generate 12 random numbers and concatenate them. I think it's an ugly idea.

Update: Does this make sense?

$bytes =getRandomBytes(7); // Just a function that returns random bytes.
$bytes[6] = $bytes[6] & chr(15); // Get rid off the other half
$bytes .= chr(0); // Add a null byte

$parts = unpack('V2', $bytes);

$number = $parts[1] + pow(2.0, 32) * $parts[2];
$number /= pow(2.0, 52);
+4  A: 

PHP's float type typically is implemented as IEEE double. This format has 52 bit precision of mantissa, so in principle it should be able to generate 252 different uniform numbers in [0, 1).

Therefore, you could extract 52 bits from the /dev/urandom, interpret as an integer, and divide by 252. For example:

// assume we have 52 bits of data, little endian.
$bytes = "\x12\x34\x56\x78\x9a\xbc\x0d\x00";
//                                  ^   ^^ 12 bits of padding.

$parts = unpack('V2', $bytes);

$theNumber = $parts[1] + pow(2.0, 32) * $parts[2];  // <-- this is precise.
$theNumber /= pow(2.0, 52);                         // <-- this is precise.
KennyTM
Yep. Although PHP follows C89, which doesn't require a IEEE double for the `double` type, it will be hard to find a platform where it isn't the case (though they may not support the standard fully).
Artefacto
Interesting, what does `interpret as an integer` actually mean in this case? I take 7 bytes (56-bits) and ...
rFactor
@rFactor: See update. You could also use "bit-shifts" to construct the raw double representation and `unpack`, but that's quite a mess in PHP.
KennyTM
When I read /dev/urandom, I read bytes, not bits. So the closest number of bytes I can read is 7 = 56-bits. How do I truncate and get rid of those extra bits?
rFactor
@rFactor: In the first round, read 7 bytes, and take off the top 4 bits. In the second round, read 6 bytes, and put that saved 4 bits back.
KennyTM
@KennyTM This won't work if an integer is 32-bit long.
Artefacto
@Artefacto: No, those are floating point arithmetic.
KennyTM
@Kenny Right. I saw `V2` and was confused :p
Artefacto
@KennyTM: Sorry, but I'm a bit confused with two things. A) I understand that we need to have the padding there for unpack to catch us, but why did u choose 0x0d and 0x00 exactly? As far as I can see its 13 + 0 = 13 -- didn't we want 12 bits of padding? B) What does the `pow(2.0, 32) * parts[2]` really mean? I know what it does but I don't understand why are you doing this exactly. 2**32 * UINT32 (that was padded)? Thanks for the help man, I just need to clear this up in my mind..
rFactor
@rFactor: (A) This is little endian. So we add 12 '0's to the end of the most significant bits. (B) `parts[1]` is the lower 32-bit, `parts[2]` is the upper 20-bit.
KennyTM
@KennyTM: Damn, I'm having hard time understanding this. Do you have any links to some resources about endianess and bits that I could read and perhaps understand your solution later on?
rFactor
@KennyTM: Does the code in my updated post make sense to you? I think I'm getting a bit of a hang on it now. :)
rFactor
@rFactor: Sorry for late reply — the updated code will throw away 8 bits of entropy every 2 runs. You need to keep the "other half" of byte 6. The next time you run it, take only 6 bytes and put the saved "other half" back.
KennyTM
@KennyTM: Oh, I see. I'll do it. :)
rFactor
+1  A: 

The problem here is that the IEEE double precision number is defined in terms of exponents of base 2:

n = 2^exponent * 1.mantissa

Since you want an exponent -1, and there's no integer number n such that 2^n = 0.1, it gets complicated.

This generates a number between 1 and 2. You can subtract 1, but you'll lose a tiny amount of entropy if you do that (KennyTM's answer yields a number in that range, though, and uses all entropy -- this answer attempts to create the representation directly):

$source = fopen("/dev/urandom", "rb");

//build big-endian double
//take only 32 bits because of 32-bit platforms
$byte_batch_1 = fread($source, 4); //32-bit
$byte_batch_2 = fread($source, 4); //32-bit, we only need 20

$offset = (1 << 10) -1;

$first_word = unpack("N", $byte_batch_2);
$first_word = reset($first_word);
$first_word &= 0xFFFFF; //leave only 20 lower bits
$first_word |= $offset << 20;

$str = pack("N", $first_word) . $byte_batch_1;

//convert to little endian if necessary
if (pack('s', 1) == "\x01\x00") { //little-endian
    $str = strrev($str);
}

$float = unpack("d", $str);
$float = reset($float);
Artefacto