ansaurus

Question

How to implement a random float function so that it does not lose entropy? (PHP)

Answer 1

+4 A:

PHP's float type typically is implemented as IEEE double. This format has 52 bit precision of mantissa, so in principle it should be able to generate 2⁵² different uniform numbers in [0, 1).

Therefore, you could extract 52 bits from the /dev/urandom, interpret as an integer, and divide by 2⁵². For example:

// assume we have 52 bits of data, little endian.
$bytes = "\x12\x34\x56\x78\x9a\xbc\x0d\x00";
//                                  ^   ^^ 12 bits of padding.

$parts = unpack('V2', $bytes);

$theNumber = $parts[1] + pow(2.0, 32) * $parts[2];  // <-- this is precise.
$theNumber /= pow(2.0, 52);                         // <-- this is precise.

KennyTM 2010-09-11 16:02:28

Yep. Although PHP follows C89, which doesn't require a IEEE double for the `double` type, it will be hard to find a platform where it isn't the case (though they may not support the standard fully).

Artefacto 2010-09-11 16:08:07

Interesting, what does `interpret as an integer` actually mean in this case? I take 7 bytes (56-bits) and ...

rFactor 2010-09-11 16:13:13

@rFactor: See update. You could also use "bit-shifts" to construct the raw double representation and `unpack`, but that's quite a mess in PHP.

KennyTM 2010-09-11 16:38:22

When I read /dev/urandom, I read bytes, not bits. So the closest number of bytes I can read is 7 = 56-bits. How do I truncate and get rid of those extra bits?

rFactor 2010-09-11 16:50:36

@rFactor: In the first round, read 7 bytes, and take off the top 4 bits. In the second round, read 6 bytes, and put that saved 4 bits back.

KennyTM 2010-09-11 16:55:16

@KennyTM This won't work if an integer is 32-bit long.

Artefacto 2010-09-11 17:05:38

@Artefacto: No, those are floating point arithmetic.

KennyTM 2010-09-11 17:15:37

@Kenny Right. I saw `V2` and was confused :p

Artefacto 2010-09-11 17:35:29

@KennyTM: Sorry, but I'm a bit confused with two things. A) I understand that we need to have the padding there for unpack to catch us, but why did u choose 0x0d and 0x00 exactly? As far as I can see its 13 + 0 = 13 -- didn't we want 12 bits of padding? B) What does the `pow(2.0, 32) * parts[2]` really mean? I know what it does but I don't understand why are you doing this exactly. 2**32 * UINT32 (that was padded)? Thanks for the help man, I just need to clear this up in my mind..

rFactor 2010-09-12 07:38:13

@rFactor: (A) This is little endian. So we add 12 '0's to the end of the most significant bits. (B) `parts[1]` is the lower 32-bit, `parts[2]` is the upper 20-bit.

KennyTM 2010-09-12 07:49:25

@KennyTM: Damn, I'm having hard time understanding this. Do you have any links to some resources about endianess and bits that I could read and perhaps understand your solution later on?

rFactor 2010-09-12 08:37:01

@KennyTM: Does the code in my updated post make sense to you? I think I'm getting a bit of a hang on it now. :)

rFactor 2010-09-12 15:43:16

@rFactor: Sorry for late reply — the updated code will throw away 8 bits of entropy every 2 runs. You need to keep the "other half" of byte 6. The next time you run it, take only 6 bytes and put the saved "other half" back.

KennyTM 2010-09-16 16:10:37

@KennyTM: Oh, I see. I'll do it. :)

rFactor 2010-09-17 18:47:06

Answer 2

+1 A:

The problem here is that the IEEE double precision number is defined in terms of exponents of base 2:

n = 2^exponent * 1.mantissa

Since you want an exponent -1, and there's no integer number n such that 2^n = 0.1, it gets complicated.

This generates a number between 1 and 2. You can subtract 1, but you'll lose a tiny amount of entropy if you do that (KennyTM's answer yields a number in that range, though, and uses all entropy -- this answer attempts to create the representation directly):

$source = fopen("/dev/urandom", "rb");

//build big-endian double
//take only 32 bits because of 32-bit platforms
$byte_batch_1 = fread($source, 4); //32-bit
$byte_batch_2 = fread($source, 4); //32-bit, we only need 20

$offset = (1 << 10) -1;

$first_word = unpack("N", $byte_batch_2);
$first_word = reset($first_word);
$first_word &= 0xFFFFF; //leave only 20 lower bits
$first_word |= $offset << 20;

$str = pack("N", $first_word) . $byte_batch_1;

//convert to little endian if necessary
if (pack('s', 1) == "\x01\x00") { //little-endian
    $str = strrev($str);
}

$float = unpack("d", $str);
$float = reset($float);

Artefacto 2010-09-11 17:48:10

ansaurus

tags:

views:

answers:

How to implement a random float function so that it does not lose entropy? (PHP)

related questions