views:

3976

answers:

15

I'm working on a database that uses a GUID for each user that uses the app it's tied to. Somehow, two users ended up with the same GUID. I know that microsoft uses an algorithm to generate a random GUID that has an extremely low chance of causing collisons, but is a collision still possible?

This is in SQL Server 2000 btw

A: 

Impossible if the users have different machines with network cards, and even if not it is still an extremely marginal almost theoretical risk.

Personally I'd look elsewhere as it is more likely a bug rather than a GUID clash...

Providing of course that you don't chop bits off the GUID to make it shorter.

Richard Harrison
The GUIDs would be generated on the Server, so the user's network cards would not come into play.
Tom Ritter
+25  A: 

Basically, no. I think someone went mucking with your database. The first 60 bits are a timestamp, and another 14 are a uniquifier.

Tom Ritter
Beaten to the punch! I was writing something very similar, with the exact same link! +1!
John Rudy
That's what I figured, but I just wanted to make sure I couldn't rule that out. You never know what kinds of weird bugs might pop up in 8 year old software. :)
Jason Baker
upvote for linking to Mr. Chen.
Joel Coehoorn
Maybe in SQL Server 2000 they are not possible, but generally they are, as there are for ways of creating GUIDs according to standard and one way uses a random number generator (except for a few bytes that are hard coded). Votes--;
Mecki
Actually that isn't true any more. It was true for v1 GUIDs, but not for the current v4 ones. See http://en.wikipedia.org/wiki/Globally_Unique_Identifier#Algorithm for more info.
Greg Beech
Greg is right, unless you use UuidCreateSequential that has not been true since ~2000 or 2001...
KristoferA - Huagati.com
+2  A: 

Of course its possible....Probable? Not likely, but it is possible.

Remember, the same machine is generating every GUID (the server), so a lot of the "randomness" that is based on machine specific information is lost.

FlySwat
IMO, you made a mistake somewhere. The odds of a collision are so low as to be insignificant.
Mitch Wheat
sorry. This comment went in the wrong answer!
Mitch Wheat
+6  A: 

See Wikipedia's Globally Unique Identifier article. There are several ways to generate GUIDs. Apparently the old (?) way used Mac address, a timestamp down to a very short unit and a unique counter (to manage fast generations on the same computer), so making them duplicate is nearly impossible. But these GUIDs were dropped because they could be used to track down users...

I am not sure of the new algorithm used by Microsoft (the article says a sequence of GUIDs can be predicted, looks like they no longer use timestamp? The Microsoft article linked above says something else...).

Now, GUIDs are carefully designed to be, by name, globally unique, so I will risk it is impossible, or of very very very low probability. I would look elsewhere.

PhiLho
+10  A: 

They are theoretically possible, but with 3.4E38 possible numbers, if you create tens of trillions of GUIDs in a year the chance of having one duplicate is 0.00000000006 (Source).

If two users ended up with the same GUID, I would wager that there is a bug in the program which is causing the data to be copied or shared.

Ben Hoffstein
"but with 3.4E38 possible numbers" - no. Two GUIDs generated almost simultaneously on the same machine would end up with extremely similar GUIDs.
Just Some Guy
That would depend on how the GUID is generated, and some implementations based on the CPU time or milliseconds will (hopefully) exagerate whatever calculation its based off of so two GUID's generated from milliseconds apart will have a vast difference.
Dalin Seivewright
With more than 1 processor on a machine, if a guid is based on time and mac address then each core could issue the same guid at the same moment in time.
AndyM
A: 

Sure it's possible, and maybe even likely. It's not like each GUID is in a random portion of the possible number space. In the event that two threads attempted to generate one simultaneously, barring some kind of centralized GUID function with a semaphore around it, they could end up with the same value.

Just Some Guy
A: 

I'll preface this with "I'm not a networking person, so I may make completely incoherent sentences following.".

When I worked at Illinois State University, we had two Dell desktops, ordered at different times. We put the first one on the network, but when we tried to put the second one on the network we started receiving crazy errors. After much troubleshooting, it was determined that both machines were producing the same GUID (I'm not sure exactly what for, but it made them both unusable on the network). Dell actually replaced both machines as defective.

John Kraft
probably same MAC address, not GUID....
KristoferA - Huagati.com
It was specifically the GUID. It had something to do with the GUID generated by the machines when they joined the network. It took several weeks for Dell to replace the machines because they said it was impossible for the GUIDs to be the same. We were able to reproduce the problem, Dell took the machines back, and were able to produce the same results on their networks. They ended up replacing both machines. As I said, I'm not a networking person, but I specifically remember it was a problem with GUIDs.
John Kraft
+13  A: 

Basically they are not possible !, the chances are astronomically low.

But... I'm the only person I the world that I know of, that had a GUID colision once (yep!).

And I'm sure of it, and that it wasn't a mistake.

How did it happen, in a small application that was running on Pocket PC, at the end of an operation a command that has an generated GUID must be issued. The command after it was executed on the server it was stored in a command table on the server along with the execution date. One day when I was debugging I issued the module command (with the newly generated GUID attached) and nothing happened. I did it again (with the same guid, because the guid was generated only once at the beginning of the operation), and again, and nothing, finally trying to find out why the command isn't executing, I checked the command table, and the same GUID as the current one was inserted 3 weeks ago. Not believing this, I restored a database from 2 weeks backup, and the guid was there. Checked the code, the new guid was freshly generated no doubt about it. Pow guid collision, happened only once, but I really wish I would have won at lotto instead,the chance is greater :).

Edit: there are some factors that could have greatly increased the chance of this happening, the application was running on the PocketPC emulator, and the emulator has a save state feature, which means that every time the state is restored the local time is restored also and the guid is based on on the internal timer....also the guid generating algorithm for compact framework might be less complete than for example the COM one...

Pop Catalin
Why the downvote, this is a true story :(, I even told it to the local developers group.
Pop Catalin
IMO, you made a mistake somewhere. The odds of a collision are so low as to be insignificant.
Mitch Wheat
@Mich that's a possibility too, but, I've checked coudn't find the mistake anywhere.
Pop Catalin
Joshua
+1  A: 

Could the code used to generate a GUID have a bug in it? Yes, of course it could. But the answer is the same as it would be for a compiler bug - your own code is orders of magnitude more likely to be buggy, so look there first.

Mark Ransom
+4  A: 

The chances of two random GUIDs colliding (~1 in 10^38) is lower than the chance of not detecting a corrupt TCP/IP packet (~1 in 10^10). http://wwwse.inf.tu-dresden.de/data/courses/SE1/SE1-2004-lec12.pdf, page 11. This is also true of disk drives, cd drives, etc...

GUIDs are statistically unique and the data you read from the db is only statistically correct.

Tony Lee
+3  A: 

I would consider Occam's razor as a good guide in this case. It is incredibly unlikely that you have a GUID collision. It is much more likely you have a bug, or someone messing with your data.

Jason Jackson
+7  A: 

First lets look at the chance of collision of two GUIDs. It is not, as other answers have stated, 1 in 2^128 (10^38) because of the birthday paradox, which means that for a 50% chance of two GUIDs colliding the probability is actually 1 in 10^64 (10^19) which is a lot smaller. However, this is still a very large number, and as such the probability of collision assuming you are using a reasonable number of GUIDs is low.

Note also that GUIDs do not contain a timestamp or the MAC address as many people also seem to believe. This was true for v1 GUIDs but now v4 GUIDs are used, which are simply a pseudo-random number which means that possibility of collision is arguably higher because they are no longer unique to a time and a machine.

So essentially the answer is yes, collisions are possible. But they are highly unlikely.

Greg Beech
While I agree with all of your facts, be careful with your math. To say that you have a 1 in 10^19 chance of any two GUIDs colliding depends on how many GUIDs are in the set. For that chance you need ~2^32 GUIDs, so in nearly all real-world scenarios the odds are much lower.
DocMax
+1 for mention of birthday paradox issue
Jimmy
+2  A: 

Two Win95 machines that have ethernet cards with duplicate MAC addresses will issue duplicate GUIDS under tightly controlled conditions.

Joshua
+1  A: 

Just for grins, try the following script... (works on SQL 2005, not sure about 2000)

declare @table table
(
    column1 uniqueidentifier default (newid()),
    column2 int,
    column3 datetime default (getdate())
)

declare @counter int

set @counter = 1

while @counter <= 10000
begin
    insert into @table (column2) values (@counter)
    set @counter = @counter + 1
end

select * from @table

select * from @table t1 join @table t2 on t1.column1 = t2.column1 and t1.column2 != t2.column2

Running this repeatedly (takes less than a second) produces a fairly wide range from the first select, even with an EXTREMELY short time gap. So far the second select hasn't produced anything.

GalacticCowboy
You need another 15 zeros at the end of the counter to have a 50% chance of a duplicate. But, for Pete's sake don't do it!
Jim Birchall
Yeah, I ran it with 100000 (one hundred thousand) and it took about a minute...
GalacticCowboy
+1  A: 

I know people like the feel-good answer that GUIDs are magical and guaranteed to be unique, but in reality, most GUIDs are just 121-bit random numbers (seven of the bits are wasted on formatting). If you wouldn't feel comfortable using a big random number, then you shouldn't feel comfortable using a GUID.

Rick Yorgason