views:

842

answers:

6

I understand how standard random number generators work. But when working with crytpography, the random numbers really have to be random.

I know there are instruments that read cosmic white noise to help generate secure hashes, but your standard PC doesn't have this.

How does a cryptographically secure random number generator get its values with no repeatable patterns?

+2  A: 

Each generator will use its own seeding strategy, but here's a bit from the Windows API documentation on CryptGenRandom

With Microsoft CSPs, CryptGenRandom uses the same random number generator used by other security components. This allows numerous processes to contribute to a system-wide seed. CryptoAPI stores an intermediate random seed with every user. To form the seed for the random number generator, a calling application supplies bits it might have—for instance, mouse or keyboard timing input—that are then combined with both the stored seed and various system data and user data such as the process ID and thread ID, the system clock, the system time, the system counter, memory status, free disk clusters, the hashed user environment block. This result is used to seed the pseudorandom number generator (PRNG).

In Windows Vista with Service Pack 1 (SP1) and later, an implementation of the AES counter-mode based PRNG specified in NIST Special Publication 800-90 is used. In Windows Vista, Windows Storage Server 2003, and Windows XP, the PRNG specified in Federal Information Processing Standard (FIPS) 186-2 is used. If an application has access to a good random source, it can fill the pbBuffer buffer with some random data before calling CryptGenRandom. The CSP then uses this data to further randomize its internal seed. It is acceptable to omit the step of initializing the pbBuffer buffer before calling CryptGenRandom.

Jekke
Ooh, nifty. Handy to know how it's done in Windows-land too, thank you.
Richard Barrell
A: 

No doubt there are some people here who are well-versed in the complexities of "true" random-number generation, but if you're just after general information... have you checked this Wikipedia page?

There's some decent info there about computational methods.

Tom
−1. If you would have linked to http://en.wikipedia.org/wiki/Cryptographically_secure_pseudorandom_number_generator it'd have been ok but conventional PRNGs are *not* suitable for cryptography. Heck, many methods discussed in that section shouldn't even be used anymore.
Joey
I stand corrected.
Tom
+5  A: 

First of all, the point of a cryptographically secure PRNG is not to generate entirely unpredictable sequences. As you noted, the absence of something that generates large volumes of (more or less) true randomness1 makes that impossible.

So you resort to something which is only hard to predict. “Hard” meaning here that it takes unfeasibly long by which time whatever it was necessary for would be obsolete anyway. There are a number of mathematical algorithms that play a part in this—you can get a glimpse if you take some well-known CSPRNGs and look at how they work.

The most common variants to build such a PRNG are:

  • Using a stream cipher, which already outputs a (supposedly secure) pseudo-random bit stream.
  • Using a block cipher in counter mode

Hash functions on a counter are also sometimes used. Wikipedia has more on this.

General requirements are just that it's unfeasible to determine the original initialization vector from a generator's bit stream and that the next bit cannot be easily predicted.

As for initialization, most CSPRNGs use various sources available on the system, ranging from truly random things like line noise, interrupts or other events in the system to other things like certain memory locations, &c. The initialization vector is preferrably really random and not dependent on a mathematical algorithm. This initialization was broken for some time in Debian's implementation of OpenSSL which led to severe security problems.


1 Which has its problems too and one has to be careful in eliminating bias as things such as thermal noise has different characteristics depending on the temperature—you almost always have bias and need to eliminate it. And that's not a trivial task in itself.

Joey
Very nice answer. Thank you for taking the time to explain.
Byron Whitlock
It also makes an interesting read how people devise some funky methods to extract some entropy :-) : http://en.wikipedia.org/wiki/Hardware_random_number_generator#Using_observed_events
andras
@Byron: There might be a better one coming up. I'm not exactly an expert on crypto stuff.
Joey
+20  A: 

A cryptographically secure number random generator, as you might use for generating encryption keys, works by gathering entropy - that is, unpredictable input - from a source which other people can't observe.

For instance, /dev/random(4) on Linux collects information from the variation in timing of hardware interrupts from sources such as hard disks returning data, keypresses and incoming network packets. This approach is secure provided that the kernel does not overestimate how much entropy it has collected. A few years back the estimations of entropy from the various different sources were all reduced, making them far more conservative. Here's an explanation of how Linux estimates entropy.

None of the above is particularly high-throughput. /dev/random(4) probably is secure, but it maintains that security by refusing to give out data once it can't be sure that that data is securely random. If you want to, for example, generate a lot of cryptographic keys and nonces then you'll probably want to resort to hardware random number generators.

Often hardware RNGs are designed about sampling from the difference between a pair of oscillators that are running at close to the same speed, but whose rates are varied slightly according to thermal noise. If I remember rightly, the random number generator that's used for the UK's premium bond lottery, ERNIE, works this way.

Alternate schemes include sampling the noise on a CCD (see lavaRND), radioactive decay (see hotbits) or atmospheric noise (see random.org, or just plug an AM radio tuned somewhere other than a station into your sound card). Or you can directly ask the computer's user to bang on their keyboard like a deranged chimpanzee for a minute, whatever floats your boat.

Apologies for the lack of inline links, but apparently I have to burn more of my lifetime on this website before I'm allowed to post more.

EDIT: belay that. Because some people were kind enough to this bag of bones to click that upwards-pointy thingy on the left, I'm apparently allowed to put the rest of the links in now. Thank you kind people! ^_^

EDIT: as andras pointed out, I only thought to talk about some of the most common entropy gathering schemes. Thomas Pornin's answer and Johannes Rössel's answer both do good jobs of explaining how one can go about mangling gathered entropy in order to hand bits of it out again.

Richard Barrell
Nice one on gathering entropy. Especially the chimpanzee thing. +1
andras
As for why it is important:"When used with asymmetric ciphers for key transfer, pseudorandom key generators are nearly always used to generate the symmetric cipher session keys. However, lack of randomness in those generators or in their initialization vectors is disastrous and has led to cryptanalytic breaks in the past. Therefore, it is essential that an implementation uses a source of high entropy for its initialization."http://en.wikipedia.org/wiki/Symmetric-key_algorithm
andras
Good clean answer. Thank you. "I have to burn more of my lifetime on this website" - LOL
Byron Whitlock
+1: Good answer, I like that `/dev/random` thing especially.
gorsky
-1 you're describing a 'true' random number generator which is similar but not the same -- true random number generators are cryptographically secure, but not all cryptographically secure random number generators are true random.
Chris Dodd
@Chris: since most of the CSPRNGs work by applying a crypto function to a seed, why the downvote? Johannes has given a pretty good answer to the question: what choices are there if I have a seed? Richard I believe did a reasonably good job of explaining how to get your hands on a seed. So if anything, you should upvote Johannes instead, since these two answers provide an answer *together*. On a side note, please read the whole question, not just the title.
andras
@Chris: I mean it seems the OP would like to know how entropy is gathered/managed in your average PC.
andras
I guess I just read the OPs question completely differently -- yes, if you have a device that reads random noise, you're fine, but what do you do if you DON'T HAVE such a device? So how does a cryptographically secure random number generator locked in a completely deterministic system work? Answer -- it doesn't really; you at least need a little bit of 'true' randomness to seed things, but once you have that, you can generate more bits that are as secure as your cryptosystem. So talking about entropy is something completely different
Chris Dodd
@Chris: No matter how good your (deterministic) CSRNG is, it will never be better than the initial seed. If you only put 64 bits of entropy into your CSRNG, but extract 128 bits for an AES key, then the seed is the weak link and an attacker can find your key by brute-forcing the CSRNG.
Rasmus Faber
Nice answer. one note: /dev/random on linux is not actually secure, /dev/urandom is the secure version.
AviD
@AviD: no, you have it the wrong way around. /dev/urandom is the insecure version - it implements exactly the same algorithm that /dev/random does, but it doesn't stop giving you output when the entropy count hits zero.Just to experimentally verify this, try:dd if=/dev/random of=/dev/nullin a terminal. Let it run for a few seconds then control-C it, at which point it'll print how much data was transferred. The numbers for /dev/random come out close to zero, but /dev/urandom continually spits out 4MB/s of data on my laptop.tl;dr, you're mistaken, good sir and/or madam. Have a nice day. :)
Richard Barrell
+1 Great answer
James Westgate
+2  A: 

In order for a random number generator to be considered cryptographically secure, in needs to be secure against attack by an adversary who knows the algorithm and a (large) number of previously generated bits. What this means is that someone with that information can't reconstruct any of the hidden internal state of the generator and give predictions of what the next bits produced will be with better than 50% accuracy.

Normal pseudo-random number generators are generally not cryptographically secure, as reconstructing the internal state from previously output bits is generaly trivial (often, the entire internal state is just the last N bits produced directly). Any random number generator without good statistical properties is also not cryptographically secure, as its output is at least party predictable even without knowing the internal state.

So, as to how they work, any good crypto system can be used as a cryptographically secure random number generator -- use the crypto system to encrypt the output of a 'normal' random number generator. Since an adversary can't reconstruct the plaintext output of the normal random number generator, he can't attack it directly. This is a somewhat circular definition an begs the question of how you key the crypto system to keep it secure, which is a whole other problem.

Chris Dodd
A nice explanation, thank you. However I think your last sentence is what really bugs the OP, when he says: "I know there are instruments that read cosmic white noise to help generate secure hashes, but your standard PC doesn't have this."
andras
andras: I don't see that as the OPs question at all, but perhaps he wants to clarify what he is asking.
Chris Dodd
+11  A: 

For cryptographic purposes, what is needed is that the stream shall be "computationally indistinguishable from uniformly random bits". "Computationally" means that it needs not be truly random, only that it appears so to anybody without access to God's own computer.

In practice, this means that the system must first gather a sequence of n truly random bits. n shall be large enough to thwart exhaustive search, i.e. it shall be infeasible to try all 2^n combinations of n bits. This is achieved, with regards to today's technology, as long as n is greater than 90-or-so, but cryptographers just love powers of two, so it is customary to use n = 128.

These n random bits are obtained by gathering "physical events" which should be unpredictable, as far as physics are concerned. Usually, timing is used: the CPU has a cycle counter which is updated several billions times per second, and some events occur with an inevitable amount of jitter (incoming network packets, mouse movements, key strokes...). The system encodes these events and then "compresses" them by applying a cryptographically secure hash function such as SHA-256 (output is then truncated to yield our n bits). What matters here is that the encoding of the physical events has enough entropy: roughly speaking, that the said events could have collectively assumed at least 2^n combinations. The hash function, by its definition, should make a good job at concentrating that entropy into a n-bit string.

Once we have n bits, we use a PRNG (Pseudo-Random Number Generator) to crank out as many bits as necessary. A PRNG is said to be cryptographically secure if, assuming that it operates over a wide enough unknown n-bit key, its output is computationally indistinguishable from uniformly random bits. In the 90's, a popular choice was RC4, which is very simple to implement, and quite fast. However, it turned out to have measurable biases, i.e. it was not as indistinguishable as was initially wished for. The eSTREAM Project consisted in gathering newer designs for PRNG (actually stream ciphers, because most stream ciphers consist in a PRNG, which output is XORed with the data to encrypt), documenting them, and promoting analysis by cryptographers. The eSTREAM Portfolio contains seven PRNG designs which were deemed secure enough (i.e. they resisted analysis and cryptographers tend to have a good understanding of why they resisted). Among them, four are "optimized for software". The good news is that while these new PRNG seem to be much more secure than RC4, they are also noticeably faster (we are talking about hundreds of megabytes per second, here). Three of them are "free for any use" and source code is provided.

From a design point of view, PRNG reuse much of the elements of block ciphers. The same concepts of avalanche and diffusion of bits into a wide internal state are used. Alternatively, a decent PRNG can be built from a block cipher: simply use the n-bit sequence as key into a block cipher, and encrypt successive values of a counter (expressed as a m-bit sequence, if the block cipher uses m-bit blocks). This produces a pseudo-random stream of bits which is computationally indistinguishable from random, as long as the block cipher is secure, and the produced stream is no longer than m*2^(m/2) bits (for m = 128, this means about 300 billions of gigabytes, so that's big enough for most purposes). That kind of usage is known as counter mode (CTR).

Usually, a block cipher in CTR mode is not as fast as a dedicated stream cipher (the point of the stream cipher is that, by forfeiting the flexibility of a block cipher, better performance is expected). However, if you happen to have one of the most recent CPU from Intel with the AES-NI instructions (which are basically an AES implementation in hardware, integrated in the CPU), then AES with CTR mode will yield unbeatable speed (several gigabytes per second).

Thomas Pornin
Thanks for taking the time. I really appreciated reading it. :) +1
andras