tags:

views:

1780

answers:

8

I create a GUID (as a string) and get the Hash of it. Can I consider this Hash to be unique?

+10  A: 

Not as reliably unique as the GUID itself, no.

Just to expand, you are reducing your uniqueness by a factor of 4. going from 16 bytes to 4 bytes of possible combinations

EDIT: As pointed out in the comments hash size will make a difference. The 4 byte thing was an assumption, horrible at best i know, that it may be used in .net, where the default hash size is 4bytes(int). so you can replace what i said above with whatever byte size your hash may be.

mattlant
4 if the hash algorithm is perfect and the hash contains 4 times fewer bits than the GUID - both of which are likely to vary depending on the context, right?
bzlm
Cryptographic hashes (e.g. MD5, SHA1) are 16-20 or more bytes. By hashing the GUID with such hash, he will not reduce uniqueness.
zvrba
In fact, the risk of collision could *increase* after hashing, even if the hash is larger than the GUID. It depends on the algorithm.
bzlm
A hash, by definition, is 'less' unique than the original. [BTW, A GUID is statistically unique]. A 4 byte hash is quite likely to have collisions.
Mitch Wheat
+3  A: 

It's not guranteed to be, due to hash collisions. The GUID itself is almost-guaranteed to be.

For practical reasons you probably can assume that a hash is unique, but why not use the GUID itself?

phjr
+6  A: 

Nope.

See here, if you want a mini GUID: http://blogs.msdn.com/oldnewthing/archive/2008/06/27/8659071.aspx

Simon Buchan
+3  A: 

Well, in a word, no.

Let's assume that your hash has fewer bits (n bit) than the GUID (m bits), by the pigeon hole principle, there must exist more than one mapping of some GUID -> hashed string A simply because there are fewer hash strings than GUIDS.

If we assume that the hash has a larger number of bits than the GUID, there is a very small, but finite, chance of a collision, assuming you're using a good hash function.

Patrick
+1  A: 

No hash function that reduces an arbitrary sized data block to a fixed size number of bits will produce a 1-to-1 mapping between the two. There will always exist a chance of having two different data blocks be reduced to the same sequence of bits in the hash.

Good hash algorithms minimizes the likelihood of this happening, and generally, the more bits in the hash, the less chance of a collision.

Lasse V. Karlsen
+1  A: 

No, and I wouldn't assume uniqueness of any hash value. That shouldn't matter because hash values don't need to unique, they just need to evenly distributed across their range. The more even the distribution, the fewer collisions occur (in the hashtable). Fewer collisions mean better hashtable performance.

fyi For a good description of how hash tables work, read the accepted answer to What are hashtables and hashmaps and their typical use cases?

Robert Paulson
A: 

If you use cryptographic hash (MD5, SHA1, RIPEMD160), the hash will be unique (modulo collisions which are very improbable -- SHA1 is used e.g. for digital signatures, and MD5 is also collision-resistant on random inputs). Though, why do you want to hash a GUID?

zvrba
A: 

If you want a general unique key, this may help:

Generating Unique Keys in .Net

bahith