tags:

views:

110

answers:

5

in c#, at least, generating a new guid is a one line one call process.

A guid is easy to use and format, and it "guarantees" uniqueness.

However, is it irresponsible to just go off and generate new guids for every little thing. Could we be significantly increasing the chances of a "ta tan tan" guid collision?!

thoughts...?

A: 

As long as you're not generating 10000 GUIDs every moment, I doubt there's a problem. GUIDs include a temporal component, so GUIDs generated in different seconds are guaranteed to be unique.

For GUIDs generated within a second, the algorithm also includes a randomizer, so you probably don't have to worry about that either. I can't remember how big this field is off the top of my head, but it should be roughly "big enough" ;)

Mike Caron
A: 

There will be no GUID collisions:

When learning about GUIDs, it feels like 38 measly digits aren’t enough. Won’t we run out if people get GUID-crazy, assigning them for everything from their pets to their favorite bubble gum flavor?

Let’s see. Think about how big the Internet is: Google has billions of web pages in its index. Let’s call it a trillion (10^12) for kicks. Think about every wikipedia article, every news item on CNN, every product in Amazon, every blog post from any author. We can assign a GUID for each of these documents.

Now let’s say everyone on Earth gets their own copy of the internet, to keep track of their stuff. Even crazier, let’s say each person gets their own copy of the internet every second. How long can we go on?

Over a billion years.

More here.

Dave Swersky
That page also says, "collisions are still possible." Unlikely, but possible.
Matthew Flaschen
+2  A: 

GUID's are 128 bit values. You could generate a million a second until the sun burned out and not encounter a collision. Specifically, the birthday paradox applied to a 128 bit value means even with 10e19 values, you still only have a 50% chance of a collision.

More specifically, 10e38 or '38 digits' as you say only seems like a measly number because of the way its expressed. Try writing it out on a piece of paper as a 1 with 38 zeros after it. Even then you probably can't picture anything like what it represents. To do that its easier to break it down somehow. The best way to do that is to illustrate some comparative values.

There are 10e7.5 seconds in a year.

There are 10e9.8 people on earth.

There are 10e10.1 years in the current age of the universe.

So even if every person currently on earth generated a million GUIDs every second since the beginning of time you would still only have used up less than 1/10,000th of the domain.

Jherico
Yeah, I just ran the numbers through `bc`. If you generated one random 128-bit number per nanosecond (GUIDs aren't fully random numbers, but still), you'd exhaust half the number space in 5,395,141,535,403,007,094,485 years. I'm not motivated enough to look up the appropriate name for that number.
Nicholas Knight
@Mark: v4 is random (assuming a proper RNG) save for always having a '4' at the start of data3, which throws the numbers off by little enough as to not matter for practical purposes.
Nicholas Knight
It really depends on whether by GUID you mean the UUID standard or one of the particular algorithms for generating same. Many early algorithms used components like a timecode, a sequence value and the mac address to generate the value, so no that was not evenly distributed. More recent algorithms use a pseudorandom value or an SHA1 cryptographic hash, and so the values can be treated as evenly distributed across the domain.
Jherico
A: 

GUIDs can be generated in a number of ways; most often a hash of several things that might be unique at any given point in time like the IP address plus the clock date/time etc are used to generate Unique Ids. Each system may have slightly different algorithm to generate the Unique Id.

Besides that think of the scope you are using your GUIDs in. It's not possible your app generates identical GUIDs.

Jeroen
A: 

The other side of the coin is that GUIDs tend to be a bit expensive to make, and ungainly to process and use. Nothing more fun that deciphering a zillion 32 CHAR IDs that sorta look alike.

Will Hartung
"deciphering a zillion 32 CHAR IDs that sorta look alike" -- welcome to IPv6, want a t-shirt? :)
Nicholas Knight