views:

1605

answers:

9

Is there a way to generate a hash of a string so that the hash itself would be of specific length? I've got a function that generates 41-byte hashes (SHA-1), but I need it to be 33-bytes max (because of certain hardware limitations). If I truncate the 41-byte hash to 33, I'd probably (certainly!) lost the uniqueness.

Or actually I suppose an MD5 algorithm would fit nicely, if I could find some C code for one with your help.

Thanks in advance for any tips/pointers!

EDIT: Thank you all for the quick and knowledgeable responses. I've chosen to go with an MD5 hash and it fits fine for my purpose. The uniqueness is an important issue, but I don't expect the number of those hashes to be very large at any given time - these hashes represent software servers on a home LAN, so at max there would be 5, maybe 10 running.

+1  A: 

I believe the MD5 hashing algorithm results in a 32 digit number so maybe that one will be more suitable.

Edit: to access MD5 functionality, it should be possible to hook into the openssl libraries. However you mentioned hardware limitations so this may no be possible in your case.

Jarod Elliott
your edit beat my answer :)
Jarod Elliott
Yes :) Would you happen to know where I could find some code for that? Thanks!
dennisV
looks like Staale beat me to that one
Jarod Elliott
+5  A: 

The way hashes are calculated that's unfortunately not possible. To limit the hash length to 33 bytes, you will have to cut it. You could xor the first and last 33 bytes, as that might keep more of the information. But even with 33 bytes you don't have that big a chance of a collision.

md5: http://www.md5hashing.com/c++/

btw. md5 is 16 bytes, sha1 20 bytes and sha256 is 32 bytes, however as hexstrings, they all double in size. If you can store bytes, you can even use sha256.

Staale
Thank you - I'm giving it a try...
dennisV
Your BTW is the real answer. If you're short on memory, don't store your hashes as hex-strings!
Andreas Magnusson
md5 is 'more broken-er' than SHA1 and sha256. You'd be better off truncating and using the extra 12 bytes of entropy.
Aaron
I like the idea to XOR the substrings together. At least, you're "keying" the substring you'll use with the rest of the bytes you initially generated.
François P.
+2  A: 

You could use an Elf hash(<- C code included) or some other simple hash function like that instead of MD5 or SHA-X. They are not secure, but they can be tuned to any length you need

Robert Gould
+1  A: 

The chance of a 33-byte collision is 1/2^132 (by the Birthday Paradox)

So don't worry about losing uniqueness.

Update: I didn't check the actual byte length of SHA1. Here's the relevant calculation: a 32-nibble collision (33 bytes of hex - 1 termination char), occurs only when the number of strings hashed becomes around sqrt(2^(32*4)) = 2^64.

Purfideas
+1  A: 

Here is an MD5 implementation in C.

jeffm
+2  A: 

Hashes are by definition only unique for small amount of data (and even then it's still not guaranteed). It is impossible to map a large amount of information uniquely to a small amount of information by virtue of the fact that you can't mmagically get rid of information and get it back later. Keep in mind this isn't compression going on.

Personally, I'd use MD5 (if you need to store in text), or a 256b (32B) hash such as SHA256 (if you can store in binary) in this situation. Truncating another hash algorithm to 33B works too, and MAY increase the possibility of generating hash collisions. It depends alot on the algorithm.

Also, yet another C implementation of MD5, by the people who designed it.

Matthew Scharley
+4  A: 

There is no more chance of collision with substring(sha_hash, 0, 33) than with any other hash that is 33 bytes long, due to the way hash algorithms are designed (entropy is evenly spread out in the resulting string).

Laurent
This isn't entirely true due to the way hashes are calculated. The maths involved is complicated, but partial collisions are much easier to generate than full collisions.
Matthew Scharley
monoxide: Yes, they're easier in proportion to the number of bits. 16 bytes of SHA1 is at least as secure as an MD5. If it were otherwise, the hashes would not be secure.
Nick Johnson
1/2 SHA1 would actually be considered more secure right now. MD5 is 'broken-er' than SHA1
Aaron
1/2 SHA1 (80 bits) is shorter than MD5 (128 bits). I doubt it would be more secure due to how short it is. And MD5 (and SHA1's) brokenness depends on the application. I would agree that SHA1 is a better choice in general, though.
Nick Johnson
+4  A: 

If I truncate the 41-byte hash to 33, I'd probably (certainly!) lost the uniqueness.

What makes you think you've got uniqueness now? Yes, there's clearly a higher chance of collision when you're only playing with 33 bytes instead of 41, but you need to be fully aware that collisions are only ever unlikely, not impossible, for any situation where it makes sense to use a hash in the first place. If you're hashing more than 41 bytes of data, there are clearly more possible combinations than there are hashes available.

Now, whether you'd be better off truncating the SHA-1 hash or using a shorter hash such as MD5, I don't know. I think I'd be more generally confident when keeping the whole of a hash, but MD5 has known vulnerabilities which may or may not be a problem for your particular application.

Jon Skeet
It's not so much that it has vulnerabilites than that it is that computing has advanced to the point where brute forcing it is now practical given the right tools. With the right precautions MD5 is more or less secure. (reads: prepending a salt)
Matthew Scharley
Truncating a hash gives you no guarantee of its uniqueness and should therefor be avoided.
Andreas Magnusson
Andreas: You already have no guarantee of uniqueness. It's a hash - it's making some "best effort" to come up with uniqueness but fundamentally you should always consider hashes to be non-unique.
Jon Skeet
A: 

Use Apache's DigestUtils:

http://commons.apache.org/codec/api-release/org/apache/commons/codec/digest/DigestUtils.html#md5Hex(java.lang.String)

Converts the hash into 32 character hex string.

Swanand