tags:

views:

342

answers:

5

I've been tasked with implementing an XOR hash for a variable length binary string in Perl; the length can range from 18 up to well over 100. In my understanding of it, I XOR the binary string I have with a key. I've read two different applications of this online:

  1. One of the options is if the length of my key is shorter than the string, I divide up the string into blocks that are the length of the key; these are then all folded together (so the length of the resulting hash would be the length of the key).
  2. I've also read that you just XOR the key across each key-length block of the string (so the resulting hash would be the length of string).

Is one of these more correct than the other? This is for hashing values in an index, so I'm inclined to think the first option (which could produce shorted hashes) would be better.

Finally, is there a good way to generate a sufficiently random key? And is there a good length to choose for the key based on the length of the strings to be hashed?

EDIT: By the way, I am very aware of how bad this hash works. It's strictly for comparison purposes. :)

A: 

The first technique can be used to create a quick and dirty hash of the string.

The second technique can be used to create a quick, dirty and terribly insecure symmetric encryption of the string.

If you want a hash, use the first method (or even better, pick an existing hash function off-the-shelf.)

The randomness of the key isn't going to be your biggest issue - the whole technique is insecure.

The longer the key, the more distinct hash values you will get, the less likely you have a collision. It doesn't take long before collisions are very rare for moderately sized data sets.

Oddthinking
+1  A: 

Generally you want your hash values to all be a consistant length. The second method you describe sounds like encryption where you want to recover your data, the first is a one way hash.

Nate Heinrich
+2  A: 

One other alternative, from here (search for XOR hashing).

Assuming the hash is supposed to be x bytes long, break the message into blocks of x bytes; and xor them together. This is effectively the same as using method 1 with a key of x 0's. (or, alternatively, starting with a key of the first x bytes of the string, and ignoring those first bytes of the string. All manner of fun ways to think about it)

(Also note what is said about XOR hashing, namely that it is bad. Very bad.) (Roughly. It's better then alternatives, but it is not sufficient for a lot of what hashing is used for)

EDIT: One other small thing; if method 1 uses the same key across all binary strings that are hashed; then it doesn't really matter what the key is. xor'ing against a constant is akin to, say, ROT13. <sarcasm>Alternatively, if you use SHA1 to derive a key per string... that might make the XOR hash much better.</sarcasm>

key xor key == 0 //always
key xor (((key xor msg1) xor msg2) xor msg3) 
== (msg1 xor msg2 xor msg3)
CoderTao
+1 `(Also note what is said about XOR hashing, namely that it is bad. Very bad.)`
Cam
Haha, I'm aware. It's for comparison's sake with better hashes.
allie
+1  A: 

xor is not a really good way to hash:

1 is sort of a hash since you realy cant get the original data back, with or without a key. i suggest using sha2 (224/256/384/512), md5, ripemd160 or whirlpool, if you can

2 is an xor cipher with a repeating key. it is definitely not a hash.

as for generating random numbers, you can find programs that generate irrational numbers in hex (like pi: 3.243F6A8885A308D313198A2E03707344A4093822299....)

calccrypto
Using the decimals of Pi would not be random at all...
Laurent
fine, then the decimal part of the hex value of some other irrational number that people dont think of, like rad(53.3)
calccrypto
A: 

If you want to perform a 'hash' that only uses XOR, I'd simply split the string up into blocks of some predetermined size X. Don't forget to somehow compensate for when the input string is smaller than X.

Cam