views:

134

answers:

5

I've read before here on SO that there are some hash algorithms (I think one of those is adler32) that support the following property:

adler32('abc'); // 123
adler32('def'); // 456
adler32('abcdef'); // 579 (123 + 456)

Please note that the results are only examples to demonstrate what I want to archieve. I've tried some examples with the hash extension in PHP with the adler and fletcher modules but the values don't seem to add up. Can someone show me some implementation examples?

+1  A: 

Unless I misunderstand I don't see how this is possible without an unnecessary number of collisions.

hash('cde') => 345 
hash('aaabcd') => 345
Byron Whitlock
technically speaking, no hash is possible without collisions. ;)
Kip
*correction: unless the hash size is larger than the maximum data size
Kip
Good point, but it seems this algorithm would make an unnecessary number of them.
Byron Whitlock
Could you explain why those two would have to collide?
Pete Kirkham
A: 

In view of Alder32 algorithm description, I don't see how the property you describe with regard to concatenation of string is possible.

Edit: removed flawed demo that such a property for a hash is impossible. Indeed an example of such a hash is the one that simply adds (well maybe modulo some big number, but that is understandable) the ASCII value of the characters in the input. (Thank you Pete Kirkham to point this error of mine).

Rather than Alder32 you may be referring to homomorphic encryption algorithms. You may find a list of such encryption schemes here. Craig Gentry, a research at IBM, also announced his success with producing fully homomorphic encryption, but I do not know if any technical details were published at this time (This would be big news, BTW).

mjv
Nor is it exhibited by putting a, aa, aaa etc into http://textop.us/Hashing/Adler-32
Pete Kirkham
The requirement from the OP was that hash('abcabc') = hash('abc')+hash('abc')=2*hash('abc'), so it's true for any x such that 2 x = x + x.
Pete Kirkham
Argh.. my bad, in my precipitation, I made a big boo boo. let me move this out out of there...
mjv
+2  A: 

If what you are looking for is a hash function on strings such that hash(s1+s2) = hash(s1) + hash(s2), I am pretty sure that all functions hash with this property are of the form (sum of the hash_char(c) for all the chars c in the string), for a fixed function hash_char.

That does not make for a very good hash function. Do you not misremember what you saw about adler32?

Pascal Cuoq
Proof: for any hash function that satisfies the condition hash(s1+s2) = hash(s1) + hash(s2), define hash_char(c)=hash("c"). Then for any string s=c1c2..cn, hash(s)=hash_char(c1)+...+hash_char(cn). Good for finding anagrams, but as Byron said, too many collisions for other applications.
Pascal Cuoq
I guess I probably misunderstood the question / answer I'll try and look for it sometime soon. =)
Alix Axel
A: 

Adler32 is a checksum algorithm rather than a hash, and doesn't have the property you describe.

It's quite common for hashes to have the property that the state of the hash can be described by the result of the hash, so if

H ( "aaa" ) = G ( H0, "aaa" )

where G is the function of the hash and the concatenation to the hash, and H0 is an initial value, often zero. Then

H ( "aaa" ) = G ( H("aa"), "a" ) = G ( G ( H("a"), "a" ), "a" ) = G ( G ( G ( H0, "a" ), "a" ), "a" )

But a function which is simply adding some function of the characters in the input together ( as is implied by your rule that G ( x .concat. y ) = G(x) + G(y) ) will collide for all permutations of that input.

One such function might create a 128 bit hash from an ASCII input:

H(x) = sum ( b => 1 << b foreach byte b in x )

which would have the property that

H("abcdef") == H("abc") + H("def") 
            == H("a") + H("b") + H("c") + H("d") + H("e") + H("f") 
            == H("abcdfe")
            == H("abcfde")
            == H("abcfed")
 etc.
Pete Kirkham
+1  A: 

Adler32 is not a hash function.
There are no good hash functions with this property.

There are, however, encryption functions with a similar property; known as homomorphism.
Basically:

E(text1)*E(text2)=cipher  
D(cipher) = text1 + text2

Where E is the encryption function, D is the encryption function, and the texts are numbers (or data serialized as a number). Note that schemes using operations other than + & * do exist.

Two examples schemes: ElGamal, and Paillier. Both are slow, which is a common feature of asymmetric encryption schemes. Both of these examples use * & +.

Kevin Montrose