ansaurus

Question

Answer 1

A:

It depends on the hashing algorithm. But theoretically, unless the hash is exactly the same as the original string there is a potential for the hash to not be unique.

A hash of a value is a condensed representation of the original value. By removing pieces of information to create the hash you are losing parts of what make it unique in the domain and therefore increasing the probability that the value will not be unique. The only way to guarantee that it will be unique is to use the original value itself which defeats the purpose of hashing.

Andrew Hare 2010-05-04 19:05:18

"The only way to guarantee that it will be unique is to use the original value itself" - clearly not true!

Martin Smith 2010-05-04 19:27:46

@Martin - Please explain what you mean, _why_ is my statement untrue?

Andrew Hare 2010-05-04 19:33:13

The hash would clearly not have to be the original string for uniqueness to be guaranteed. It could be the original string ROT13 ed for example and still be unique.

Martin Smith 2010-05-04 19:54:04

c.f. Perfect Hash Functions http://en.wikipedia.org/wiki/Perfect_hash_function

Martin Smith 2010-05-04 20:08:35

Answer 2

A:

There is a small possibility that two different values give the same hash. Although very small, it's not unlikely.

Ikke 2010-05-04 19:06:37

Answer 3

+5 A:

In general, the answer is no. This is trivial to show: SHA-1 has 2^160 different outputs - 160 bits, but there are many more inputs that that (e.g., there are 2^320 different 40-byte strings, and they can't all map to a unique output).

Given a sufficient subset of values, the answer is maybe. It depends on the exact algorithm and the size of the subset: if the number of possible inputs is smaller than the number of possible outputs, then it is possible (but NOT guaranteed). When thinking about this, it may be helpful to keep the birthday paradox in mind: the probability of a collision does not increase linearly with the number of inputs.

Michael Madsen 2010-05-04 19:08:24

Thanks. So the only way to get unique values is to generate and scan through the DB to check if it exists (repeat if yes)? That's pretty much what I was trying to avoid doing here, but I guess it's the only way.

Nebs 2010-05-04 19:11:01

Unfortunately, there's no other way if you want to guarantee a unique value. This is also why you can't easily reverse a hash: I can tell you that the SHA-1 hash for "1" is "356a192b7913b04c54574d18c28d46e6395428ab", but there are many other values that will generate that hash.

Michael Madsen 2010-05-04 19:14:41

I see. The thing is I probably won't need to generate more than say 1000 unique values. Would it be safe to say that all the values will be unique in this case?

Nebs 2010-05-04 19:16:08

No. Even with just two values, there is still a 1 in 2^160 chance that they hash to the same value. That may seem infinitely small, but it's still bigger than 0, which means that, although it is *likely* that your 1000 values all hash to different values, it cannot be guaranteed without testing. If you know the possible values in advance, you can calculate the hash for all of them, put them in a list, and sort it by the hash - that will make it easy to find duplicates.

Michael Madsen 2010-05-04 19:23:26

You're right, thanks. I ended up going with uniqid as per RenderIn's suggestion.

Nebs 2010-05-04 20:27:59

ansaurus

tags:

views:

answers:

Hash of unique value = unique hash?

related questions