ansaurus

Question

Commutative, accumulator-based function for calculating a digest of multiple hashes

Answer 1

A:

Order independent hashing of collections of hashes (is essentially what you're looking for, non?)

It sounds like any order independent operation (like addition or multiplication) will do the trick for you. Addition has the benefit of overflowing in a nice way. I don't recall if multiplication will work as well.

In short: add all of your values, ignoring the overflow, and you should get something useful. Any other similar function should do the trick if addition isn't sufficiently collision resistant.

Slartibartfast 2010-07-07 02:56:50

Whoa, too simple. I tried it with random numbers. Addition is stable and all bits are evenly distributed... Anyone know how collision resistant this is? Thanks.

jthg 2010-07-07 04:41:22

Btw, simple multiplication doesn't work. I'm not sure what the effect is called, but the resulting hashes don't come out evenly distributed. Imagine what would happen if any of the inputs is 0.

jthg 2010-07-07 04:45:58

Did you use unsigned integers? And yes, zeros are absorbing in multiplication. To avoid absorption, you can replace all zeros by a fixed non-zero number.

Peter G. 2010-07-07 09:17:35

No, I used signed integers.

jthg 2010-07-07 21:50:59

Is 'too simple' a criticism or a compliment? You should plan for collisions whichever way you go. If you get too many, use another method. I suspect the collision rate will be comparable with the original hash function (but not equal to). Addition will handle 1 vs. 3 files for most cases using cryptographic hash functions ( (A + A + A) mod MAX+1 would have to equal A). Note that you have to do X-bit math (where X bits is the size of the output of your hash function) to take full advantage of your hash output (so that may mean bignum math).

Slartibartfast 2010-07-08 01:13:12

Compliment. I went ahead and accumulated both a sum and an xor and subtracted the two in the end. So the hash is sum(x1..xn) - xor(x1..xn)

jthg 2010-07-08 03:04:46

Actually, subtracting the xor is not a good idea, it introduces a couple of collision cases. Just using the sum by itself works pretty well. I tried both sum and md5 on my file system and got equivalent results.

jthg 2010-07-26 14:08:54

Answer 2

A:

as the count of items is important but the order isn't; just sort the list of hashes and then hash the list.

find . | xargs sha1sum | cut -c -40 | sort | sha1sum

this would give the type of hash value which is invariant to the directory arrangement

Dan D 2010-07-11 09:04:23

ansaurus

tags:

views:

answers:

Commutative, accumulator-based function for calculating a digest of multiple hashes

related questions