tags:

views:

676

answers:

8

Before you start marking this as a duplicate, read me out. The other question has a (most likely) incorrect accepted answer.

I do not know how .NET generates its GUIDs, probably only Microsoft does, but there's a high chance it simply calls CoCreateGuid(). That function however is documented to be calling UuidCreate(). And the algorithms for creating an UUID are pretty well documented.

Long story short, be as it may, it seems that System.Guid.NewGuid() indeed uses version 4 UUID generation algorithm, because all the GUIDs it generates matches the criteria (see for yourself, I tried a couple million GUIDs, they all matched).

In other words, these GUIDs are almost random, except for a few known bits.

This then again raises the question - how random IS this random? As every good little programmer knows, a pseudo-random number algorithm is only as random as its seed (aka entropy). So what is the seed for UuidCreate()? How ofter is the PRNG re-seeded? Is it cryptographically strong, or can I expect the same GUIDs to start pouring out if two computers accidentally call System.Guid.NewGuid() at the same time? And can the state of the PRNG be guessed if sufficiently many sequentially generated GUIDs are gathered?

Added: To clarify, I'd like to find out how random can I trust it to be and thus - where can I use it. So, let's establish a rough "randomness" scale here:

  1. Basic randomness, taking current time as the seed. Usable for shuffling cards in Solitaire but little else as collisions are too easy to come by even without trying.
  2. More advanced randomness, using not only the time but other machine-specific factors for seed. Perhaps also seeded only once on system startup. This can be used for generating IDs in a DB because duplicates are unlikely. Still, it's not good for security because the results can be predicted with sufficient effort.
  3. Cryptograhpically random, using device noise or other advanced sources of randomness for seed. Re-seeded on every invocation or at least pretty often. Can be used for session IDs, handed out to untrusted parties, etc.

I arrived at this question while thinking if it would be OK to use them as DB IDs, and whether the Guid.comb algorithm implementation together with System.Guid.NewGuid() (like NHibernate does it) would be flawed or not.

A: 

They are random so that it is mathematcially provable that collisions should not occur for a very long time, so that you can assume that they are unique globally. However, they are not cryptographically strong, since this would require a true randomness, which isn't really possible in computers without dedicated hardware.

Lucero
True, but I think that device noise makes a pretty good source of entropy.
Vilx-
Errr..... cryptographically strong random number generators do exist. They can be done in 100% software. (Although hardware support does make it much easier to write them, and much easier to prove security). On Windows PCs, CryptGenRandom() is the normal one.
user9876
Well, just the fact there were known problems with it until Windows Vista and that the algorithm is not published and just takes into account a very long list of specific information to make it as random as possible doesn't mean that a replay is impossible, at least in a virtualized lab environment (which doesn't mean that it is unsafe to use in the wild). http://en.wikipedia.org/wiki/CryptGenRandom
Lucero
If it is mathematically provable that collisions should not occur then by definition it is NOT random at all! That's like saying a roulette table is only random if a number doesn't come up a second time until all the others have.
Robin Day
+2  A: 

APIs that produce random bytes but which are not explicitly documented to produce cryptographically strong random bytes cannot be trusted to produce cryptographically strong random bytes.

If you need cryptographically strong random bytes, then you should be using an API which is explicitly documented to produce them.

public Guid CreateCryptographicallyStrongGuid() {
    var rng = new System.Security.Cryptography.RNGCryptoServiceProvider();
    var data = new byte[16];
    rng.GetBytes(data);
    return new Guid(data);
}

These GUIDs are simply 128 bits of cryptographic randomness. They are not structured, and they will not collide.

See this article for some of the math. Using "The General Birthday Formula", rearranging gives

n = sqrt(-2T * ln(p))

where n is the number of chosen elements, T is the total number of elements (2^128), and p is the target probability that all n chosen elements will be different. With p = .99, this gives *n = 2.61532104 * 10^18*. This means that we can generate a billion truly random GUIDs per second within a system for a billion seconds (32 years), and have better than 99% chance at the end that each one is unique within the system.

Justice
This might produce pseude-random values, but you loose the guarantee that the generated guid is unique (although the likeliness of a collision has a low probability if you only generate few values).
0xA3
@0xA3 - What if I generate high volumes of GUIDs on many networked machines? (Think: heavily used business application which uses GUIDs for all its primary keys in the DB).
Vilx-
@Vilx: Even if you generate really many GUIDs on many networked machines the values are guaranteed to be unique, see Raymond Chen's post: http://blogs.msdn.com/oldnewthing/archive/2008/06/27/8659071.aspx (although if you create sufficient many GUIDs you will eventually get a collision, but I guess the universe will no longer exist at that point in time).
0xA3
@0xA3 - that post details an ancient GUID generation algorithm (version 1 UUID) which hasn't been used for over a decade. See my Wikipedia link about UUID and re-read my post. And I was asking about uniqueness of the above code snippet.
Vilx-
Does this code snippet generate valid GUIDs? I can't see where you set the bits that indicate the GUID type.
user9876
@user9876 - I don't think it's a requirement that the GUID should have the special bits. Some algorithms do that (including the ones that Microsoft ships with Windows), but it's not incorrect to create fully random ones.
Vilx-
Edited for extra commentary and math.
Justice
@Vilx: According to RFC4122 (http://tools.ietf.org/html/rfc4122) even UUID version 4 should set version bits and 2 other reserved bit.
0xA3
Then this snippet generates cryptographically random bits that fill a `System.Guid` struct and that can be used exactly like all other UUIDs, but which is not in and of itself a UUID in compliance with RFC4122. I do not recall ever needing an RFC-compliant UUID, rather than simply a 128-bit random or even cryptographically-random number.
Justice
+18  A: 

The accepted answer to a related question states:

A GUID doesn't make guarantees about randomness, it makes guarantees around uniqueness. If you want randomness, use Random to generate a string.

Anything else is an implementation detail (and might change).

Update: To make my point clearer: Even if the current .NET 3.5 implementation produced a truly random guid (which is not the case) there is no guarantee that this would be the case in the future or true for other implementations of the BCL (e.g. Mono, Silverlight, CF, etc)

Update 2: The format of UUID is specified by RFC4122. Section 6 makes an explicit statement on security:

Do not assume that UUIDs are hard to guess; they should not be used as security capabilities (identifiers whose mere possession grants access), for example. A predictable random number source will exacerbate the situation.

0xA3
Good answer on the security view. I trust GUID's to be unique, for record identifiers, but do not rely on them for determining application security.
Wez
+5  A: 

The definition of Random in no way relates to the definition of Globally Unique.

Flipping a coin twice and getting HH, HT, TH, TT are all random. HH is just as random as HT.

Flipping a "special" coin twice and guaranteeing that you will only get HT or TH is uniqueness.

Robin Day
A: 

GUIDs are designed to be at number 2 on your scale, i.e. "can be used for generating IDs in a DB because duplicates are unlikely*."

As for security, the problem isn't "it's not good for security because the results can be predicted with sufficient effort.". The problem is that no-one gives you a documented security guarantee.

In practise, according to this comment and this one, the GUID generation is implemented in terms of a cryptographically secure RNG (CryptGenRandom). But that appears to be an undocumented implementation detail. (And I haven't verified this - it's random comments on the Internet, take with a truckload of salt).

(* Where "unlikely" means something like "the chances of anyone finding a duplicate GUID before the end of the universe are less than the chances of you personally winning the lottery." Implementation bugs excepted, of course.)

user9876
+4  A: 

Some people have already hinted at that but I want to repeat it since there appears to be a misconception there:

Randomness and uniqueness are orthogonal concepts.

Random data can be unique or redundant, and likewise unique data can use a random source or a deterministic source (think a global counter that is locked and incremented for every GUID ever created).

GUIDs were designed to be unique, not random. If the .NET generator appears to use random input, fine. But don’t rely on it as a source of randomness, neither for cryptographical nor for any other purposes (in particular, what distribution function do you expect to get?). On the other hand, you can be reasonably sure that GUIDs created by .NET, even in large volumes, will be unique.

Konrad Rudolph
That's a great explanation.
sharptooth
A: 

I read somewhere that the chances of winning the lottery would be equivalent to 2 4-byte "GUIDs" colliding. The standard 16-byte GUIDs would offer much less chance of collision.

ck
A: 

Focussing on your question of using GUID's as row identifiers:

GUID's are for databases geared towards replication, or generating rows ahead of time before adding them to the DB. If you do not need GUID's to solve a particular problem, try stick to incremental numbering. GUID's complicate debugging and testing a little bit.

The COMB method in the article you mention seems pretty great, actually. I never realized, thanks for that one! (p.s. the printer friendly version of that article reads much better)

So if you don't need to generate a GUID ahead of time, you can let the database handle the GUID generation for you. The speed differences you'll only notice if you start adding 10,000's of records in one go, which you shouldn't do anyway, that's what bulk-importing is for.

Also have a look at Jeff on ID's vs GUID's

create table #temp ([id] uniqueidentifier primary key default(newid()), [name] varchar(20))
insert into #temp (name) values ('apple')
insert into #temp (name) values ('orange')
insert into #temp (name) values ('banana')
select * from #temp
drop table #temp

id                                   name
------------------------------------ --------------------
911B0CBD-4EED-4EB0-8488-1B2CDD915C02 banana
56CF3A80-A2DE-4949-9C9B-5F890824EA9C orange
5990B9FD-143D-41B0-89D1-957B2C57AB94 apple
Wez
The need for GUIDs in my case is complex. First of all I'm using NHIbernate ORM, and secondly I'd like to create objects in my app and then let the user decide whether or not to save them (eg. whether the user presses OK or Cancel). If I used IDENTITY NHibernate would need to insert the record as soon as it was created to get the ID (which is what I don't want). GUIDs allow me to create IDs without touching the DB, and COMB takes care of performance problems. But it can fail if duplicate GUIDs are generated because of subtleties in implementation not known by the original authors.
Vilx-