ansaurus

Question

How Random is System.Guid.NewGuid()? (Take two)

Answer 1

A:

They are random so that it is mathematcially provable that collisions should not occur for a very long time, so that you can assume that they are unique globally. However, they are not cryptographically strong, since this would require a true randomness, which isn't really possible in computers without dedicated hardware.

Lucero 2010-04-12 11:17:31

True, but I think that device noise makes a pretty good source of entropy.

Vilx- 2010-04-12 11:19:49

Errr..... cryptographically strong random number generators do exist. They can be done in 100% software. (Although hardware support does make it much easier to write them, and much easier to prove security). On Windows PCs, CryptGenRandom() is the normal one.

user9876 2010-04-12 11:21:44

Well, just the fact there were known problems with it until Windows Vista and that the algorithm is not published and just takes into account a very long list of specific information to make it as random as possible doesn't mean that a replay is impossible, at least in a virtualized lab environment (which doesn't mean that it is unsafe to use in the wild). http://en.wikipedia.org/wiki/CryptGenRandom

Lucero 2010-04-12 11:26:08

If it is mathematically provable that collisions should not occur then by definition it is NOT random at all! That's like saying a roulette table is only random if a number doesn't come up a second time until all the others have.

Robin Day 2010-04-12 11:34:14

Answer 2

+2 A:

APIs that produce random bytes but which are not explicitly documented to produce cryptographically strong random bytes cannot be trusted to produce cryptographically strong random bytes.

If you need cryptographically strong random bytes, then you should be using an API which is explicitly documented to produce them.

public Guid CreateCryptographicallyStrongGuid() {
    var rng = new System.Security.Cryptography.RNGCryptoServiceProvider();
    var data = new byte[16];
    rng.GetBytes(data);
    return new Guid(data);
}

These GUIDs are simply 128 bits of cryptographic randomness. They are not structured, and they will not collide.

See this article for some of the math. Using "The General Birthday Formula", rearranging gives

n = sqrt(-2T * ln(p))

where n is the number of chosen elements, T is the total number of elements (2^128), and p is the target probability that all n chosen elements will be different. With p = .99, this gives *n = 2.61532104 * 10^18*. This means that we can generate a billion truly random GUIDs per second within a system for a billion seconds (32 years), and have better than 99% chance at the end that each one is unique within the system.

Justice 2010-04-12 11:23:00

This might produce pseude-random values, but you loose the guarantee that the generated guid is unique (although the likeliness of a collision has a low probability if you only generate few values).

0xA3 2010-04-12 11:29:14

@0xA3 - What if I generate high volumes of GUIDs on many networked machines? (Think: heavily used business application which uses GUIDs for all its primary keys in the DB).

Vilx- 2010-04-12 12:25:51

@Vilx: Even if you generate really many GUIDs on many networked machines the values are guaranteed to be unique, see Raymond Chen's post: http://blogs.msdn.com/oldnewthing/archive/2008/06/27/8659071.aspx (although if you create sufficient many GUIDs you will eventually get a collision, but I guess the universe will no longer exist at that point in time).

0xA3 2010-04-12 12:33:23

@0xA3 - that post details an ancient GUID generation algorithm (version 1 UUID) which hasn't been used for over a decade. See my Wikipedia link about UUID and re-read my post. And I was asking about uniqueness of the above code snippet.

Vilx- 2010-04-12 12:38:20

Does this code snippet generate valid GUIDs? I can't see where you set the bits that indicate the GUID type.

user9876 2010-04-12 12:46:03

@user9876 - I don't think it's a requirement that the GUID should have the special bits. Some algorithms do that (including the ones that Microsoft ships with Windows), but it's not incorrect to create fully random ones.

Vilx- 2010-04-12 12:59:34

Edited for extra commentary and math.

Justice 2010-04-12 13:45:50

@Vilx: According to RFC4122 (http://tools.ietf.org/html/rfc4122) even UUID version 4 should set version bits and 2 other reserved bit.

0xA3 2010-04-12 15:46:06

Then this snippet generates cryptographically random bits that fill a `System.Guid` struct and that can be used exactly like all other UUIDs, but which is not in and of itself a UUID in compliance with RFC4122. I do not recall ever needing an RFC-compliant UUID, rather than simply a 128-bit random or even cryptographically-random number.

Justice 2010-04-12 16:29:27

Answer 3

+18 A:

The accepted answer to a related question states:

A GUID doesn't make guarantees about randomness, it makes guarantees around uniqueness. If you want randomness, use Random to generate a string.

Anything else is an implementation detail (and might change).

Update: To make my point clearer: Even if the current .NET 3.5 implementation produced a truly random guid (which is not the case) there is no guarantee that this would be the case in the future or true for other implementations of the BCL (e.g. Mono, Silverlight, CF, etc)

Update 2: The format of UUID is specified by RFC4122. Section 6 makes an explicit statement on security:

Do not assume that UUIDs are hard to guess; they should not be used as security capabilities (identifiers whose mere possession grants access), for example. A predictable random number source will exacerbate the situation.

0xA3 2010-04-12 11:23:07

Good answer on the security view. I trust GUID's to be unique, for record identifiers, but do not rely on them for determining application security.

Wez 2010-04-16 07:42:05

Answer 4

+5 A:

The definition of Random in no way relates to the definition of Globally Unique.

Flipping a coin twice and getting HH, HT, TH, TT are all random. HH is just as random as HT.

Flipping a "special" coin twice and guaranteeing that you will only get HT or TH is uniqueness.

Robin Day 2010-04-12 11:31:34

Answer 5

A:

GUIDs are designed to be at number 2 on your scale, i.e. "can be used for generating IDs in a DB because duplicates are unlikely*."

As for security, the problem isn't "it's not good for security because the results can be predicted with sufficient effort.". The problem is that no-one gives you a documented security guarantee.

In practise, according to this comment and this one, the GUID generation is implemented in terms of a cryptographically secure RNG (CryptGenRandom). But that appears to be an undocumented implementation detail. (And I haven't verified this - it's random comments on the Internet, take with a truckload of salt).

(* Where "unlikely" means something like "the chances of anyone finding a duplicate GUID before the end of the universe are less than the chances of you personally winning the lottery." Implementation bugs excepted, of course.)

user9876 2010-04-12 13:02:44

Answer 6

+4 A:

Some people have already hinted at that but I want to repeat it since there appears to be a misconception there:

Randomness and uniqueness are orthogonal concepts.

Random data can be unique or redundant, and likewise unique data can use a random source or a deterministic source (think a global counter that is locked and incremented for every GUID ever created).

GUIDs were designed to be unique, not random. If the .NET generator appears to use random input, fine. But don’t rely on it as a source of randomness, neither for cryptographical nor for any other purposes (in particular, what distribution function do you expect to get?). On the other hand, you can be reasonably sure that GUIDs created by .NET, even in large volumes, will be unique.

Konrad Rudolph 2010-04-12 13:31:38

That's a great explanation.

sharptooth 2010-09-08 11:55:26

Answer 7

A:

I read somewhere that the chances of winning the lottery would be equivalent to 2 4-byte "GUIDs" colliding. The standard 16-byte GUIDs would offer much less chance of collision.

ck 2010-04-12 15:43:40

Answer 8

A:

Focussing on your question of using GUID's as row identifiers:

GUID's are for databases geared towards replication, or generating rows ahead of time before adding them to the DB. If you do not need GUID's to solve a particular problem, try stick to incremental numbering. GUID's complicate debugging and testing a little bit.

The COMB method in the article you mention seems pretty great, actually. I never realized, thanks for that one! (p.s. the printer friendly version of that article reads much better)

So if you don't need to generate a GUID ahead of time, you can let the database handle the GUID generation for you. The speed differences you'll only notice if you start adding 10,000's of records in one go, which you shouldn't do anyway, that's what bulk-importing is for.

Also have a look at Jeff on ID's vs GUID's

create table #temp ([id] uniqueidentifier primary key default(newid()), [name] varchar(20))
insert into #temp (name) values ('apple')
insert into #temp (name) values ('orange')
insert into #temp (name) values ('banana')
select * from #temp
drop table #temp

id                                   name
------------------------------------ --------------------
911B0CBD-4EED-4EB0-8488-1B2CDD915C02 banana
56CF3A80-A2DE-4949-9C9B-5F890824EA9C orange
5990B9FD-143D-41B0-89D1-957B2C57AB94 apple

Wez 2010-04-15 08:17:19

The need for GUIDs in my case is complex. First of all I'm using NHIbernate ORM, and secondly I'd like to create objects in my app and then let the user decide whether or not to save them (eg. whether the user presses OK or Cancel). If I used IDENTITY NHibernate would need to insert the record as soon as it was created to get the ID (which is what I don't want). GUIDs allow me to create IDs without touching the DB, and COMB takes care of performance problems. But it can fail if duplicate GUIDs are generated because of subtleties in implementation not known by the original authors.

Vilx- 2010-04-15 08:55:23

ansaurus

tags:

views:

answers:

How Random is System.Guid.NewGuid()? (Take two)

related questions