56686

27
+172  Q:

Simple proof that GUID is not unique

I'd like to prove that a GUID is not unique in a simple test program. I expected the following code to run for hours, but it's not working. How can I make it work?

``````BigInteger begin = new BigInteger((long)0);
BigInteger end = new BigInteger("340282366920938463463374607431768211456",10);  //2^128
for(begin; begin<end; begin++)
Console.WriteLine(System.Guid.NewGuid().ToString());
``````

I'm using C#.

+19  A:

GUID is most likely unique.

See this SO entry, and from Wikipedia

While each generated GUID is not guaranteed to be unique, the total number of unique keys (2^128 or 3.4×10^38) is so large that the probability of the same number being generated twice is very small. For example, consider the observable universe, which contains about 5×10^22 stars; every star could then have 6.8×10^15 universally unique GUIDs.

So probably you have to wait for many more billion of years, and hope that you hit one before the universe as we know it comes to an end.

so is 2^128 not the correct number of possible guids?
It is. Why do you think 2^128 is a small number?
Yes, 2^128 is the correct number of possible guids.
That's a hell of a number. `\$ irb >> 2**128=> 340282366920938463463374607431768211456`
@Infinity - Even to you?
+9  A:
``````for(begin; begin<end; begin)
Console.WriteLine(System.Guid.NewGuid().ToString());
``````

You aren't incrementing `begin` so the condition `begin < end` is always true.

sorry, typing mistake - fixed it
Does the loop run at all?
no - cause i can't iterate bigint
You mean "increment"?
@David I sure did. Whoops.
+151  A:

This will run for a lot more than hours. Assuming it loops at 1 GHz (which it won't - it will be a lot slower than that), it will run for 10790283070806014188970 years. Which is about 83 billion times longer than the age of the universe.

Assuming Moores law holds, it would be a lot quicker to not run this program, wait several hundred years and run it on a computer that is billions of times faster. In fact, any program that takes longer to run than it takes CPU speeds to double (about 18 months) will complete sooner if you wait until the CPU speeds have increased and buy a new CPU before running it (unless you write it so that it can be suspended and resumed on new hardware).

damn - so maybe serveral threads generating guids is a better idea?
4 threads on a quad core processor would make it run in 20 billion times the age of the universe - so yeah, that would help a lot.
I am suspicious that this is a troll, but on the off chance that it isn't: threads are not magical. If you can do a billion operations per second on one thread, then going to ten threads means that each one runs 1/10th as often. Each thread does 100 M operations per second; the total number of operations per second is not increased. The way to increase number of operations per second is to buy more computers. Suppose you bought a billion more computers. That would reduce the problem to only taking 10790283070806 years, which is still more than four hours.
I think rjmunro is assuming each thread would run on a separate core; 83 billion universes / 4 cores indeed approximately equals 20 billion universes. Time to buy Intel stock!
Would change my downvote to an upvote but it says the vote is too old unless the answer is edited... :(
aren't you all going off on one? GUIDs are inheriently non random. They are based on the MAC address of the TCP/IP adaptor in yr machine. The Time they were generated and a random number. So given this weighting they are a lot easier to predict if you have the original machine and no when it was generated. See below.
Why don't we just use 83 billion processors? =)
@Tony Lambert - That was one of the GUID generation algorithms, but isn't widely used nowadays because of privacy concerns.
@Erik 83 billion processors means that you will be able to do it in about the amount of time the universe has existed so far. So even that's not nearly enough.
@Eric: Your answer would only be correct on a single-core machine, right?
@Steven. Yes. I suppose I should have said "buy more cores".
@Eric: Yes, roughly 83 billion more...
If I remember correctly generating GUIDs on this platform is serialized, so don't waste your time refactoring your code for multiple cores.
Moores law does not in fact talk about CPU speeds. It talks about the number of transistors per CPU. Which actually means that threads will become very, very relevant to the argument of waiting for faster hardware.
+118  A:

A GUID is theoretically non-unique. Here's your proof:

• GUID is a 128 bit number
• You cannot generate 2^128 + 1 or more GUIDs without re-using old GUIDs

However, if the entire power output of the sun was directed at performing this task, it would go cold long before it finished.

GUIDs can be generated using a number of different tactics, some of which take special measures to guarantee that a given machine will not generate the same GUID twice. Finding collisions in a particular algorithm would show that your particular method for generating GUIDs is bad, but would not prove anything about GUIDs in general.

Pigeonhole Principle to the rescue!
+1 for the sun going cold comment. There was an interesting comment somewhere about the pointlessness of encryption keys > 256 bits. To iterate over all possible key values would take more energy than the entire universe holds. Toggling a bit in the CPU requires a small amount energy (it's what generates the heat) which, when multiplied up 2^256 times is a really massive number exceeding the energy stored in the universe, using E=mc2 the universe would need mass of 2^227kg, our sun is 2^101kg so thats 2^126 suns!
@Skizz: This is true for brute force attacks only. When an encryption scheme is "broken", it means that it can be solved in less time than brute force, but the solve time remains proportional to the key size.
+1  A:

maybe using a dict for storing and checking against.but how to do with lots of threads to reduce time?

``````System.Guid g = System.Guid.NewGuid();
Dictionary<System.Guid, int> guids = new Dictionary<System.Guid, int>();
while(!guids.ContainsKey(g))
{
g = System.Guid.NewGuid();
}
Console.WriteLine("Done!");
``````
You will only actually save time using threads if you use the same amount as the number of cores in your machine. Or cluster of billions -- nay, trillions -- of machines you will need to perform this outrageous task.
Kai, I think you can stop now.
To say nothing of the data storage. Each Guid is 16 bytes. That times 2^128... Well, I don't think there's an SI unit for that.
16 cores and on each running a single thread would safe a lot of time. maybe I can have access to that...
Do you really not understand that that would still take 674,392,691,925,375,886,811 years to compute, as rjmunro generously calculated for you?
How about send the output to the printer then you could check it manually... then you would have hard evidence of your check :)
All that's left of mankind... is one poor, sad little inkjet, spitting out page after page in a cold universe...
@Paul, so you're saying there's a chance?
+7  A:

Presumably you have reason to be believe that the algorithm for producing Guids is not producing truly random numbers, but is in fact cycling with a period << 2^128.

e.g. RFC4122 method used to derive GUIDs which fixes the values of some bits.

Proof of cycling is going to depend upon the possible size of the period.

For small periods, hash table of hash(GUID) -> GUID with replacement on collision if GUIDs do not match (terminate if they do) might be an approach. Consider also only doing the replacement a random fraction of the time.

Ultimately if the maximum period between collisions is large enough (and isn't known in advance) any method is only going to yield a probability that the collision would be found if it existed.

Note that if the method of generating Guids is clock based (see the RFC), then it may not be possible to determine if collisions exist because either (a) you won't be able to wait long enough for the clock to wrap round, or (b) you can't request enough Guids within a clock tick to force a collision.

Alternatively you might be able to show a statistical relationship between the bits in the Guid, or a correlation of bits between Guids. Such a relationship might make it highly probable that the algorithm is flawed without necessarily being able to find an actual collision.

Of course, if you just want to prove that Guids can collide, then a mathematical proof, not a program, is the answer.

A:

Have you tried `begin = begin + new BigInteger((long)1)` in place of begin++?

+90  A:

Of course GUIDs can collide. Since GUIDs are 128-bits, just generate `2^128 + 1` of them and by the pigeonhole principle there must be a collision.

But when we say that a GUID is a unique, what we really mean is that the key space is so large that it is practically impossible to accidentally generate the same GUID twice (assuming that we are generating GUIDs randomly).

If you generate a sequence of `n` GUIDs randomly, then the probability of at least one collision is approximately `p(n) = 1 - exp(-n^2 / 2 * 2^128)` (this is the birthday problem with the number of possible birthdays being `2^128`).

``````   n     p(n)
2^30 1.69e-21
2^40 1.77e-15
2^50 1.86e-10
2^60 1.95e-03
``````

To make these numbers concrete, `2^60 = 1.15e+18`. So, if you generate one billion GUIDs per second, it will take you 36 years to generate `2^60` random GUIDs and even then the probability that you have a collision is still `1.95e-03`. You're more likely to be murdered at some point in your life (`4.76e-03`) than you are to find a collision over the next 36 years. Good luck.

If you're murdered at some point in your life, odds are it'll be at the end.
@mmyers: Excellent point. That means that my chances of being murdered right now are absurdly low, since this isn't the end of my life. Oh, wait...
+1 for discussion of the birthday paradox.
Also, if two GUIDs happen to be created within a short period, the chances they are used within the same system is slight. Therefore, this increases uniqueness.
@mmyers: Absolutely hilarious
These numbers and reference to the birthday problem are meaningless. GUID generation algorithms don't generate values over the entire range with equal probability. In fact IIRC the original algorithm used the MAC address of the generating PC + the current time as part of the result - which reduces the risk of collision with Guids generated on other PCs, but of course reduces the key space.
@Joe: Who says you have to use an off-the-shelf generation algorithm?
+33  A:

If you're worried about uniqueness you can always purchase new GUIDs so you can throw away your old ones. I'll put some up on eBay if you'd like.

Cool - how much for the complete set, from 0 to (2^128)-1 ?
On sale, \$0.01 per 1k GUIDs. I'll throw in some bamboo wind chimes if you order in the next 60 minutes.
My set is more exclusive and of higher quality. They are double checked and verified which makes them worth the \$1 per GUID. You can even buy them in batches if you don't want to make the full investment in one go. I will have to charge an extra \$10 per batch though.
+12  A:

Counting to 2^128 - ambitious.

Lets imagine that we can count 2^32 IDs per second per machine - not that ambitious, since it's not even 4.3 billion per second. Lets dedicate 2^32 machines to that task. Furthermore, lets get 2^32 civilisations to each dedicate the same resources to the task.

So far, we can count 2^96 IDs per second, meaning we will be counting for 2^32 seconds (a little over 136 years).

Now, all we need is to get 4,294,967,296 civilisations to each dedicate 4,294,967,296 machines, each machine capable of counting 4,294,967,296 IDs per second, purely to this task for the next 136 years or so - I suggest we get started on this essential task right now ;-)

+232  A:

Kai, I have provided a program that will do what you want using threads. It is licensed under the following terms: you must pay me \$0.0001 per hour per CPU core you run it on. Fees are payable at the end of each calendar month. Please contact me for my paypal account details at your earliest convenience.

``````using System;
using System.Collections.Generic;
using System.Linq;

namespace GuidCollisionDetector
{
class Program
{
static void Main(string[] args)
{
var reserveSomeRam = new byte[1024 * 1024 * 100];

Console.WriteLine("{0:u} - Building a bigHeapOGuids.", DateTime.Now);
// Fill up memory with guids.
var bigHeapOGuids = new HashSet<Guid>();
try
{
do
{
} while (true);
}
catch (OutOfMemoryException)
{
// Release the ram we allocated up front.
GC.KeepAlive(reserveSomeRam);
GC.Collect();
}
Console.WriteLine("{0:u} - Built bigHeapOGuids, contains {1} of them.", DateTime.Now, bigHeapOGuids.LongCount());

// Spool up some threads to keep checking if there's a match.
// Keep running until the heat death of the universe.
for (long k = 0; k < Int64.MaxValue; k++)
{
for (long j = 0; j < Int64.MaxValue; j++)
{
Console.WriteLine("{0:u} - Looking for collisions with {1} thread(s)....", DateTime.Now, Environment.ProcessorCount);
{
if (bigHeapOGuids.Contains(Guid.NewGuid()))
throw new ApplicationException("Guids collided! Oh my gosh!");
}
);
Console.WriteLine("{0:u} - That was another {1} attempts without a collision.", DateTime.Now, ((long)Int32.MaxValue) * Environment.ProcessorCount);
}
}
Console.WriteLine("Umm... why hasn't the universe ended yet?");
}
}
}
``````

PS: I wanted to try out the Parallel extensions library. That was easy.

And using OutOfMemoryException as control flow just feels wrong.

EDIT

Well, it seems this still attracts votes. So I've fixed the GC.KeepAlive() issue. And changed it to run with C# 4.

And to clarify my support terms: support is only available on the 28/Feb/2010. Please use a time machine to make support requests on that day only.

That last Console.WriteLine made me laugh real hard. I think you should throw a `CommonlyAcceptedCosmologicTheoriesWrongException` instead.
Code that makes me laugh - in a good way! +1
does marking this as Accepted also means that @Kai accepts the terms stipulated by @ligos?
Setting `reserveSomeRam = null;` doesn't actually accomplish anything.
@devinb please explain? it looks like it is freeing the bytes it was previously allocated so that the GC can `Collect()` it. Why doesn't it accomplish anything?
@ligos: You need to put a GC.KeepAlive(reserveSomeRam) somewhere--probably after the try/catch--in order to prevent collection of the array. Hopefully you're offering eternal customer support with your licensing terms...
@ligos: a few comments: code you put in SO answers must be public domain; I believe you mean "universe" and not "univese"; I ran your program in the future; I ran it in reverse time so you'd have to pay me money; surprisingly it didn't find a collision and the reason that the universe has not ended has something to do with highly charged recursive damma fluctuation cores.
@mythz: The JIT will detect that the array referenced by `reserveSomeRam` is never used (i.e., the value in `reserveSomeRam` is never read), so it will be GCed long before the variable is nulled out. As @Curt Nichols mentions, a call to `GC.KeepAlive` would need to be added to prevent this.
@BradleyGrainger I see it now - Good Catch :)
@yairchu: executing my program at any time other than now is not in the EULA, my legal team will be contacting you last Thursday.
Thanks all for the GC.KeepAlive thing.
@ligos, now the memory will never be deleted, plus, the compiler might be too smart for the suggested solution to work correctly anyway (realize the only way out of the loop is through the catch and thus, sees that the original assignment is still never used)
"And using OutOfMemoryException as control flow just feels wrong". If it was written in VB it would be common practice.
GuidCollisionDetector. The name has potential
Thanks for this code. I have put it in my GUID generator code for making cross database foreign keys and it's been a great help in preventing all the collisions we've been experiencing.However, now my application freezes up and I can't see why?! Must be a .Net framework bug or something.
+2  A:

Here's a nifty little extension method that you can use if you want to check guid uniqueness in many places in your code.

``````internal static class GuidExt
{
public static bool IsUnique(this Guid guid)
{
while (guid != Guid.NewGuid())
{ }
return false;
}
}
``````

To call it, simply call Guid.IsUnique whenever you generate a new guid...

``````Guid g = Guid.NewGuid();
if (!g.IsUnique())
{
throw new GuidIsNotUniqueException();
}
``````

...heck, I'd even recommend calling it twice to make sure it got it right in the first round.

This is the absolutely best solution I have read on here so far!
+6  A:

Aren't you all missing a major point?

I thought GUIDs were generated using two things which make the chances of them being Globally unique quite high. One is they are seeded with the MAC address of the machine that you are on and two they use the time that they were generated plus a random number.

So unless you run it on the actual machine and run all you guesses within the smallest amount of time that the machine uses to represent a time in the GUID you will never generate the same number no matter how many guesses you take using the system call.

I guess if you know the actual way a GUID is made would actually shorten the time to guess quite substantially.

Tony

Not all GUIDs are created this way. Even if they were, Kai need only wait until the timestamp used to create the GUID wraps around enough times for one he used to create a GUID is used again.
Guids have not been based on the mac address since 2000 or 2001. As of one of the service packs for NT4 and/or Win2k they changed the algorithm altogether. They are now generated by a random number generator, minus a few bits that identify what kind of guid it is.
not all GUIDs come from windows platforms...
OP mentions C#, so it's Windows. Besides, are V4 GUID's a Windows-only thing?
@Steven: OP mentions C#, so it's... one of several platforms that you can currently compile C# for (.NET, Mono, dotGNU, MonoTouch, ...)
@Martinho: Ah, but Mono's unit test for Guid, in GuidTest.cs, contains a method which creates two new GUID's and checks them for equality, failing if they are equal. As Mono builds successfully, we can be absolutely certain that its GUIDs are unique! :-)
@Martinho: Humor aside, Mono uses a PRNG. http://search.koders.com/csharp/fid93482A6411C4588CEFFA15E52501B793BD3231E6.aspx?s=guid.cs+mono#L2
+15  A:

You can show that in O(1) time with a variant of the quantum bogosort algorithm.

``````Guid g1 = Guid.NewGuid();
Guid g2 = Guid.NewGuid();
if(g1 != g2) Universe.Current.Destroy();
``````
I'm getting an exception when calling Destroy(). Based on the text, I think my computer lacks the necessary hardware to destroy the current universe. Do you know where I might be able to obtain it?
@Steven: Nah, some management guys got too worried about how bad that API would look to the public, and dictated it to always fail for "security reasons". If you look at the method's source there's just that one line: `throw new MundaneHardwareException();`. Anyway, I heard the guys at CERN have some kind of Big Hadron Thingy that might do the trick...
@Martinho: Ah, ok. I'll look into replacing `Universe.Current.Destroy()` with `Cern.Lhc.DestroyThisUniverse()`.
I knew there was a reason I programmed in Haskell. These side effects are getting scary.
"There is a theory which states that if ever anyone ever discovers exactly what the Universe is for and why it is here, it will instantly disappear and be replaced by something even more bizarrely inexplicable.There is another theory that states that this has already happened."-- Douglas Adams, _The Hitchhiker's Guide to the Galaxy_
I am confused. Shouldn't it be `if(g1 == g2)`?
@AMissico: I believe the point is to destroy all possible universes in which you don't find a match so that in the surviving universe(s) you do have a match. Of course, I wonder if the assignments would need to be something like Guid g1 = Universe.Quantum.Branch(Guid.NewGuid()); Otherwise, the deterministic nature of pseudorandom generation of Guid.NewGuid() would tend to produce the same result in all universes branching from the same initial execution.
@Rob: surely if we can destroy universes, we can have truly random (i.e., quantum-based) generators :P
@Martinho: Sure, but that wouldn't be in the built-in implementation of Guid.NewGuid() would it? But my code example above is probably incorrect; you would need to use your truly-random source or a more exact Universe.Quantum.Branch.ChooseBytes(16) (returns a byte array of specified length with a 1-for-1 distribution among the necessary branches generated) to then feed to the Guid creation (or set the value into a blank Guid instance). Or you could chain 128 calls to Universe.Quantum.Branch.Fork() to build up the bits (wich is what ChooseBytes(int) does under the covers).
rollback trans;
+3  A:

You could hash the GUIDs. That way, you should get a result much faster.

Oh, of course, running multiple threads at the same time is also a good idea, that way you'll increase the chance of a race condition generating the same GUID twice on different threads.

+4  A:

340282366920938463463374607431768211456? Let me check ... yes, you're in luck. I can help. I got that number once. Consider it proven.

Maybe even a couple of times ...

HEY! I got that number back in 1996!
+20  A:

[Update:] As the comments below point out, newer MS GUIDs are V4 and do not use the MAC address as part of the GUID generation (I haven't seen any indication of a V5 implementation from MS though, so if anyone has a link confirming that let me know). WIth V4 though, time is still a factor though, and the odds against duplication of GUIDs remains so small as to be irrelevant for any practical usage. You certainly would not be likely to ever generate a duplicate GUID from just a single system test such as the OP was trying to do.

Most of these answers are missing one vital point about Microsoft's GUID implementation. The first part of the GUID is based on a timestamp and another part is based on the MAC address of the network card (or a random number if no NIC is installed).

If I understand this correctly, it means that the only reliable way to duplicate a GUID would be to run simultainous GUID generations on multiple machines where the MAC addresses were the same AND where the clocks on both systems were at the same exact time when the generation occured (the timestamp is based on milliseconds if I understand it correctly).... even then there are a lot of other bits in the number that are random, so the odds are still vanishingly small.

For all practical purposes the GUIDs are universally unique.

There is a pretty good description of the MS GUID over at "The Old New Thing" blog

Thats actually doable when using virtualization. You can and you do get duplicate guids.
Raymond is outdated on the MAC Address part though, Microsoft doesn't use these anymore. See http://en.wikipedia.org/wiki/GUID#Algorithm for the difference between V1 and V4 Guids.
This is no longer the case. The current V5 scheme is just 128 bits of pure pseudorandom goodness.
funny how you state everything I did a month later than me and you get 16 points and I still have 0?
Ya Tony, there is something odd with that. Back when I answered the post, there were only 3 or 4 answers, and I didn't remember seeing yours... if I had, I'd just have upvoted it. I typically don't answer questions when there are other answers already that cover it well enough (which is why I have a rather low overall rep probably).
+1 from me, well done!
+2  A:

A table with 1,000,000,000,000,000,000,000,000,000,000,000,000,000,000 "unique" identifiers in it.

That's a good one - Been there, done that!
Don't forget to put it as a primary key, so we can find the error, LOL
does "dbcc checkdb" show the error?
+2  A:

If you send me an email I will email you back your very own unique GUID. If later examination turns out that it isn't unique after all, I will give you your money back.

+2  A:

The program, albeit its errors, shows proof that a GUID is not unique. Those that try to prove the contrary are missing the point. This statement just proves the weak implementation of some of the GUID variations.

A GUID is not necessary unique by definition, it is highly unique by definition. You just refined the meaning of highly. Depending on the version, the implementator (MS or others), use of VM's, etc your definition of highly changes. (see link in earlier post)

You can shorten your 128 bit table to prove your point. The best solution is to use a hash formula to shorten your table with duplicates, and then use the full value once the hash collides and based on that re-generate a GUID. If running from different locations, you would be storing your hash/full key pairs in a central location.

Ps: If the goal is just to generate x number of different values, create a hash table of this width and just check on the hash value.

+7  A:

Well if the running time of 83 billion years does not scare you, think that you will also need to store the generated GUIDs somewhere to check if you have a duplicate; storing 2^128 16-byte numbers would only require you to allocate 4951760157141521099596496896 terabytes of RAM upfront, so imagining you have a computer which could fit all that and that you somehow find a place to buy terabyte DIMMs at 10 grams each, combined they will weigh more than 8 Earth masses, so you can seriously shift it off the current orbit, before you even press "Run". Think twice!

+5  A:

I don't understand why no one has mentioned upgrading your graphics card... Surely if you got a high-end NVIDIA Quadro FX 4800 or something (192 CUDA cores) this would go faster...

Of course if you could afford a few NVIDIA Qadro Plex 2200 S4s (at 960 CUDA cores each), this calculation would really scream. Perhaps NVIDIA would be willing to loan you a few for a "Technology Demonstration" as a PR stunt?

Surely they'd want to be part of this historic calculation...

hmmmm..... I could run it on our 10,000 node grid at work.
+6  A:

If GUID collisions are a concern, I would recommend using the ScottGuID instead.

+9  A:

Personally, I think the "Big Bang" was caused when two GUIDs collided.

A:

Quantum Computer to the rescue?

Which, of course, the Pigeon Hole Principle is proof enough to logically prove. A brute force proof is unnecessary.
+2  A:

The odds of a bug in the GUID generating code are much higher than the odds of the algorithm generating a collision. The odds of a bug in your code to test the GUIDs is even greater. Give up.

A:

GUIDs are 124 bits because 4 bits hold the version number.

the reason for not adding this as a comment:no one mentioned it, and i don't know who i should tell this to.:)
You could always tell Wikipedia if they don't already know...