views:

248

answers:

9

I have a 128bit ID that I want to perform a one way hash on, but I don't want to ever get the same digest for an input message. Does anyone know if sha-1, or an alternative, is guaranteed not to produce collisions for the set of messages less than its output digest size? This is at least theoretically possible...

I also considered using RSA, and discarding the private key to give me a one-way encrypt, but I need to store the result in a 32 char DB field, and the encryption schemes available to me don't produce anything small enough.

Any suggestions of another way of producing a deterministic, non-reversable and collision free transform of the original value are welcome.

+5  A: 

Cryptographic hashes give a very good approximation of a random number for a given input. So how many random hashes do you need in a room until you get the same 160 bits? It about the square root (disclaimer: I am not a statistician). So you should expect to see clashes at around 80-bits.

I guess practicalities mean you should know when cosmic rays will be a bigger problem than collisions.

Tom Hawtin - tackline
According to this paper http://www.ictroma3.it/public/uploads/Finding_Collisions_in_the_Full_SHA-1.pdf I think you should expect clashes around 69-bits.
hoang
In fact within 63-bits : http://www.rsa.com/rsalabs/node.asp?id=2927
hoang
@hoang That's an attack complexity of 2^63, not a message length. So you should be able to find a collision by only computing 2^63 hashes. But if we were to talk about "randomly" selected messages it is still 2^80. Of course if you selected specific messages (and had a sufficient mechanism to find them), you only need two for a collision!
Tom Hawtin - tackline
+2  A: 

Does anyone know if sha-1, or an alternative, is guaranteed not to produce collisions

Hash functions were designed not to produce collisions, but nothing is "guaranteed." On the contrary, it is guaranteed that there WILL be collisions, because the message space is practically indefinite, while you have a limited number of possible hashes.

SHA-1 however has been proven to be collision-resistant, and that's the best you can hope for.

quantumSoup
The message space isn't indefinite if it's 128-bit.
Skilldrick
I understand that sha-1 must produce collisions when the message is larger than the output, my questions is regarding the specific case where the input size is less than or equal to the output size
Malcolm Smith
Not "message", but "message space." I get your question, but as I said, there are no guarantees. Just work on the assumption that they are collision-resistant, because nobody has yet proved it otherwise.
quantumSoup
128-bit isn't infinite, sure, but (to me) it's really really big! 2^128 is over 10^38. The universe is only around 10^17 seconds old, so a collision would be like asking people to pick 2 specific seconds in time (since the beginning of the universe), and someone picking the same pair as someone else. It may just be my feeble mind, but I'd agree with calling that "practically indefinite". :-)
Ken
+3  A: 

Your ID is unique and 128 bits.

Your comments explain that you cannot use the ID as-is.

You need it to be unique, not just probably unique. Therefore, you cannot use a hash.

You cannot have both worlds - you cannot have a 1:1 mapping that is not reversible. Its an impossibility.

Encrypting - a bijective operation, so there'll be no collisions - IDs with a secret key will make reversing the ID to determine its original value very hard.

AES has a nice block length of 128 bits which will generate 128 bits output from your 128 bits of input, is faster than old algorithms (!) and is widely available for most platforms and languages. I suggest you use AES for your purpose.

Will
The ID is sensitive in its raw form, I want to transform it in such a way that the original ID is unobtainable but in a deterministic way, so the same original always transforms to the same output. 160 bits is just what sha-1 does. I'm not particularly after 160.
Malcolm Smith
Thanks, that pretty much confirms the way my thinking has been led on this, abeit with the exception that you can make a 1:1 mapping very hard to reverse using encryption and keeping the key secret.
Malcolm Smith
@Malcolm Smith, yeap, the 'very hard' is certainly true; it only fails at the higher bar of "unobtainable but in a deterministic way". I'll suggest you use encryption so you can accept my answer ;)
Will
@Will Hmmm, nice try but your answer is a bit too messy to be acceptable.
Malcolm Smith
It's an important point that the stated requirements ("non-reversable and collision free") are in essential conflict.
caf
+1  A: 

I don't know what hash functions avoid collisions but, if you can't find the answer here, a good starting point might be Perfect Hash Function on Wikipedia. From that page:

A perfect hash function for a set S is a hash function that maps distinct elements in S to distinct integers, with no collisions.

There's a number of links to more information on that page that you may find useful.

That being said, can you say why you need a perfect has function? It may be that there are other ways to accomplish what you want to without needing that property, and someone here may be able to make a suggestion.

RHSeeger
Perfect hashes are difficult to write – impossible for arbitrary string inputs – so you typically have to make do with hash-equality not implying value-equality (but the other direction of implication is vital). There *are* some very good hash algorithms though.
Donal Fellows
@Donal - Fair enough, but bear in mind that his inputs are a fairly limited set (128bit key), so it might be possible for him. (Random aside... I followed the discussion on hashes on the Tcl Core list, which certainly expanded my knowledge of hashes and the issues involved :)
RHSeeger
A: 

Isn't the point of a one-way hash that you can't (in general) recover the original data from the hashed value? So why would someone designing a hash function go out of their way to prevent collisions for small inputs?

Instead, it looks like you want to obscure the data, which is fine for some purposes. If it's not practical to use public key encryption, you might as well write your own function.

Justin K
As a simple example, if one had a table in which one had to be able to query people by Social Security Number but one wished to keep the Social Security Number confidential(*), and if one had a hashing function that yielded a different output for each SSN, one could use the hash of the SSN as the primary key for the table. If the hash function produced collisions for two SSN's that were both supposed to be stored in the same table, such an approach would fail.(*) Illustrative example only, since 1,000,000,000-item hash space would be brute-force searchable.
supercat
I'm not disputing that such a function would be useful. I'm just saying that it's a different use case than that for one-way hash functions.
Justin K
Yes, I'm aware that sha-1 was not designed to hash small values, and therefore would not be designed to ensure no collisions for the message space less than or equal to its digest size, but wondered if anyone could definitvely tell me: yes, or no, or suggest another way of producing a deterministic, non-reversable and collision free transform of the original value.
Malcolm Smith
That's the thing: the way to guarantee that the function is collision-free is to make it reversible. You can't lose any information in the transformation, or you risk collisions because the output space is smaller than the input space. The best you can do is use some number theory to make it hard to reverse, as in the RSA idea you had. I'll let you know if anything comes to mind.
Justin K
A: 

According to this article http://www.debian-administration.org/users/dkg/weblog/48,

US gov't federal agencies have been directed to cease all reliance on SHA-1 by the end of 2010

However, as far as I know no collision has been found yet , which means , even people which have tried really hard haven't found any collisions. So it seems reasonable to assume that no collision happen (if you are not dealing with high-security data)

mb14
The no collision found, stands for any size of the message , so even for short message, the probability of collision is almost null.
mb14
+1  A: 
supercat
Interesting, but you missed some of your example implementation: for (i=0; i ...Does this paper cover what you are proposing?http://www.farcaster.com/papers/crypto-field/node1.html
Malcolm Smith
Thanks for the link. It would seem that the size of the numbers required would probably be similar to those required for secure RSA encryption, at least for the simple spin function shown. If one were to use an exponentiation operation which operated in some base other than base 2, I don't know how that would affect things. One advantage of base 2 is that if N is prime, 2^N-1 is often prime. By contrast, k^N-1 is never prime for other k unless N is one.
supercat
+5  A: 

If you want to compute a secret permutation from 128 bits to 128 bits, one simple solution is to use a 128-bit block cipher like AES with a fixed but secret key. You must, of course, be able to keep the key secret forever or you've got nothing.

GregS
Exactly what I was going to suggest.
Brendan Long
yep, that's what he really needs.
irreputable
This would indeed be ideal, but unfortunately raises other issues around key management that make it unattractive to my particular circumstancies.
Malcolm Smith
A: 

Hashing is "unlikely" to produce any duplicates, but there are no guarantees. On the other hand, any symmetric encryption scheme will produce 128 bits out for 128 bits in, and guarantee no duplicates.

On the other hand, if you are depending on the hashes being unique, my intuition is you're doing something wrong. If you're using hashes to obfuscate passwords for example, you have to be careful that you don't make the hashed password the de facto password.

ddyer