views:

186

answers:

7

Mathematically I suppose it's possible that even two random GUIDs generated using the built in method in the .NET framework are identical, but roughly how likely are they to clash if you generate hundreds or thousands?

If you generated one for every copy of Windows in the world, would they clash?

The reason I ask is because I have a program that creates a lot of objects, and destroys some too, and I am wondering about the likelihood of any of those objects (including the destroyed ones) having identical GUIDs.

+5  A: 

A GUID has components based on

  • Time (System clock)

  • Space (System MAC address)

  • Random numbers

So if one is generated for each machine in the world at the sam etime, they will differ by their MAC and random numbers

Here's a helpful link. http://blogs.msdn.com/oldnewthing/archive/2008/06/27/8659071.aspx

Midhat
Oh wow I never realised it was so complex. I assumed it was just one big random number ;) If the GUIDs are all generated in one program, presumably that removes the MAC part of it, and random numbers + time are left... Still unique? Or unique enough at least?
SLC
random number = 7
knoopx
Not all GUIDs have that structure, though it seems like a reasonable choice.
GregS
A: 

It would take a really long time!

codingguy3000
42.............
kenny
+1  A: 

Just to add to Midhat's right answer, here is a quotation from Eric Lippert's Blog about the situation, where there is no network card installed in the system (therefore, no MAC address):

(Machines that do not have network cards generate special GUIDs which are in a "known to be potentially not unique" range.)

n535
GUIDs haven't been generated using the MAC address for a long time. The Melissa worm took care of that.
Hans Passant
Oh, thanks, i didn't think about it as a security vulnerability, will study wikipedia now.
n535
A: 

It is hard to calculate the chances without knowing the inner-details of the GUID-generator's implementation.

You can use combinatorics to get the numbers, but that will only help you assuming that the combinations are equally-likely. Therefore, without any statistical knowledge of the implementation - it would be hard to tell the real chances.

As opposed to what Midhat implies (if i understood him correctly), GUID collisions are possible. Built-in Random Number Generators are usually implemented using a timestamp-based seed. MAC addresses are not unique by nature, as they can be overwritten in many situations (and they are, at least in some cases i know of). It is possible that two GUID-generators will gain the same input and thus yield the same output.

GUIDs are 128-bit long, so "there is enough for everyone to use", but that does not guarantee that collisions won't occur.

M.A. Hanin
+7  A: 

There are ~3E38 possible GUID values. But the Birthday Paradox cuts the 50/50 odds to producing a duplicate GUID to ~1E19. While still an enormous number, comparing quite favorably to the odds that your machine will be destroyed by a meteor impact first, the system clock is used to ensure no duplicates can occur.

Many large and mission critical dbase apps use a GUID as the primary key in a table. Don't hesitate to follow their lead.

Hans Passant
Not all kind of GUID's use the system clock. Hence a collision is _theoretically_ possible.
Guids don't use clocks, Guid generators do. A meteor exploded over Wisconsin this week, quite spectacular. I liked my father's opinion about theories like that: "If the sky falls, we'll all wear a blue hat".
Hans Passant
And if you get a dupe, buy multiple lotto tickets
PostMan
A: 

Having spent the past 25 years working with RPC and COM (where GUIDs and UUIDs are critical) and working with distributed databases where GUIDs are used as unique row identifiers, I have never encountered a collision problem - whether they were generated on a single machines or different machines. Another interesting take on this from MSDN where as rowids they are much longer lived than as objects: http://weblogs.asp.net/wwright/archive/2007/11/04/the-gospel-of-the-guid-and-why-it-matters.aspx

A: 

This isn't something you should be at all concerned with. It's just the availability heuristic at work. It's a "risk" that you know about and recognize, so you want to care about it. But there are many other risks millions of times more likely, that we still don't worry about. The wonderful Pro Git book says it best, I think:

A higher probability exists that every member of your programming team will be attacked and killed by wolves in unrelated incidents on the same night.

You would have to be generating million or billions for it to even be a remote possibility.

Nick Lewis
that's quite worrying if you're a one man team working in the deep forest in a rickity old hut.
Phil Nash