views:

78

answers:

2

I am working on a Fountain Code based file transfer system. In this system blocks of data are downloaded, combined with an xor function. I want to verify the blocks as they arrive.

What I need is a cryptographically secure hash function which has the property:

Hash(A) ^ Hash(B) == Hash(A ^ B)

does such a thing exist?

Note: The data blocks must be combined with an xor function, the hashes can be combined with any function you like, so long as it's reasonably cheap to compute.

+2  A: 

What you want is called a Homomorphic Hash. I'm not up with the latest developments, but the one I saw described is extremely - almost unfeasibly - slow to compute. The original paper is here, and a followup with some refinements to its use is here.

As for combining blocks, the hash typically requires you use addition in a prime field. If you're using fountain codes, you don't have to use xor, though - any reversible function is fine, and that includes addition. The hash described above works on addition and multiplication in a prime field, and is provably secure.

Nick Johnson
Those papers you link look extremely promising, I shall take a read of those :)
Martin
Those papers address the exact problem I'm trying to solve. perfect!
Martin
+6  A: 

If the identity you are requesting is exactly

Hash(A) ^ Hash(B) == Hash(A ^ B)

then no, no such cryptographically secure hash function is possible. This is because your function would then be a linear map (over the field with two elements) from the space of possible blocks to the space of possible hashes.

What does this mean, in simpler terms?

Well, suppose your map takes blocks of length 6 and returns hashes of length 3, and that these are some of the hashes:

Hash(000001) = 010
Hash(000010) = 111
Hash(000100) = 001
Hash(001000) = 101
Hash(010000) = 110
Hash(100000) = 001

Then you can compute the hash of any given block by linear combinations of the above. For example,

Hash(101000) = Hash(100000) ^ Hash(001000) = 001 ^ 101 = 100.

This means that your hash function can be represented by a 6-by-3 matrix.

What does this imply?

Wikipedia defines the ideal cryptographic hash function as having four main or significant properties:

  • it is easy to compute the hash value for any given message,
  • it is infeasible to find a message that has a given hash,
  • it is infeasible to modify a message without changing its hash,
  • it is infeasible to find two different messages with the same hash.

The first property may be true, of course, but the rest will not be. Inverting the hash function is as simple as solving a system of linear equations, which is easy. I assume you've done this for linear maps over the real numbers, but the exact same approach works here.

If you find an element of the kernel of the hash function, that is, a message K such that Hash(K) is all zeros, then the last property fails also. Take any message M; then M and M^K will have the same hash, because Hash(M^K) = Hash(M)^Hash(K) = Hash(M)^0 = Hash(M). Finding elements of the kernel is easy, also.

The third property is a little more difficult, but can be broken also. (For example, suppose you're hashing a legal contract. Find a few places where some stray commas can be modified or something. Consider the effect that these changes have on the hash function, and then solve a system of linear equations.)

A. Rex
Nick Johnson
Yeah, my answer only addresses the equation `H(x^y)=H(x)^H(y)` -- but given the word "exactly" in my first sentence, it claims to do no more. I'm actually a bit curious about the situation you mention.
A. Rex
Right - I was just pointing out that that's only a subset of the possible solutions. Even if the function for adding blocks had to be xor (which it doesn't, for fountain codes), the function for adding hashes could still be something else.
Nick Johnson
I only said it has to be xor for combining blocks because I already have that implemented, that can be changed if it's really necessary.
Martin