tags:

views:

136

answers:

2

I'm looking for a PRNG (pseudo randomness) that you initially seed with an arbitrary array of bytes.

Heard of any?

+1  A: 

Why don't you just XOR your arbitrary sequence into a type of the right length (padding it with part of itself if necessary)? For example, if you want the seed "paxdiablo" and your PRNG has a four-byte seed:

paxd    0x70617864
iabl    0x6961626c
opax    0x6f706178
        ----------
        0x76707b70 or 0x707b7076 (Intel-endian).

I know that seed looks artificial (and it is since the key is chosen from alpha characters). If you really wanted to make it disparate where the phrase is likely to come from a similar range, XOR it again with a differentiator like 0xdeadbeef or 0xa55a1248:

paxd    0x70617864    0x70617864
iabl    0x6961626c    0x6961626c
opax    0x6f706178    0x6f706178
        0xdeadbeef    0xa55a1248
        ----------    ----------
        0xa8ddc59f    0xd32a6938

I prefer the second one since it will more readily move similar bytes into disparate ranges (the upper bits of the bytes in the differentiator are disparate).

paxdiablo
Thanks, I thought of this. However this effecitvly works as a hash function. So while my input is more diverse I reduce it to a smaller window and increase the chance of a collision. I'd like to avoid this.I've seen PRNG implementations that work with a backing table which is generated by a smaller seed. But I should be able to generate this same table with any byte array as input. Bigger input, less chance of a collision, right?That's what I want.
John Leidegren
+2  A: 

Hashing your arbitrary length seed (instead of using XOR as paxdiablo suggested) will ensure that collisions are extremely unlikely, i.e. equal to the probability of a hash collision, with something such as SHA1/2 this is a practical impossibility.

You can then use your hashed seed as the input to a decent PRNG such as my favourite, the Mersenne Twister.

UPDATE

The Mersenne Twister implementation available here already seems to accept an arbitrary length key: http://code.msdn.microsoft.com/MersenneTwister/Release/ProjectReleases.aspx?ReleaseId=529

UPDATE 2

For an analysis of just how unlikely a SHA2 collision is see how hard someone would have to work to find one, quoting http://en.wikipedia.org/wiki/SHA_hash_functions#SHA-2 :

There are two meet-in-the-middle preimage attacks against SHA-2 with a reduced number of rounds. The first one attacks 41-round SHA-256 out of 64 rounds with time complexity of 2^253.5 and space complexity of 2^16, and 46-round SHA-512 out of 80 rounds with time 2^511.5 and space 2^3. The second one attacks 42-round SHA-256 with time complexity of 2^251.7 and space complexity of 2^12, and 42-round SHA-512 with time 2^502 and space 2^22.

chillitom
Sometimes you get lucky. I was considering the Mersenne Twister all along. Wasn't sure about the implementation though. Thanks! The code needs some touching up, and inheriting from the existing Random class is just stupid, besides that, looks great!
John Leidegren
SHA256 gives 2^256 different possible hashes. To give you an idea of how large that is, there's around 2^265 atoms in the known universe.
BlueRaja - Danny Pflughoeft