views:

183

answers:

1

If my math is right, I can quickly generate a new hash value for the concatenation of two strings if I already have the individual hash values for each string. But only if the hash function is of the form:

hash(n) = k * hash(n-1) + c(n), and h(0) = 0.

In this case,

hash( concat(s1,s2) ) = k**length(s2) * hash(s1) + hash(s2)

eg.

h1  = makeHash32_SDBM( "abcdef",          6 );
h2  = makeHash32_SDBM( "ghijklmn",        8 );
h12 = makeHash32_SDBM( "abcdefghijklmn", 14 );
hx  = mod32_powI( 65599, 8 ) * h1 + h2;

h1  = 2534611139
h2  = 2107082500
h12 = 1695963591
hx  = 1695963591

Note that h12 = hx so this demonstrates the idea.

Now, for the SDBM hash k=65599. Whereas the DJB hash has k=33 (or perhaps 31?) and h(0) = 5381 so to make it work you can set h(0) = 0 instead.

But a modification on the DJB hash uses xor instead of + to add each character.

http://www.cse.yorku.ca/~oz/hash.html

Is there another technique to quickly calculate the hash value of concatenated strings if the hash function uses xor instead of +?

A: 

That would be true if your second hash will be function of initial state of hash. For some kinds of hash-function it's easy to shift them according to new initial state (like xor'e words, or their sum etc). But in general case that's almost impossible (in other case use of hash+"salt" in password matching will be easier to break).

But usually you can use result of hashing of first string and than continue feeding data from second string.

Update:
I guess there is no way to find f(x,y) for:

h_abc = hashOf(h0, "abc")  
h_def = hashOf(h0, "def")  
(h_abcdef = f(h_abc, h_def)) == hashOf(h0, "abcdef")  

But you able to:

h_abc = hashOf(h1, "abc")  
(h_abcdef = hashOf(h_abc, "def")) == hashOf(h0, "abcdef")  

one of the reason for why you can't do that is that "33" isn't power of "2". If it will use "32" (2**5), then:

h_abcdef == (h_abc << (5*len(abc))) xor h_def
ony
I think you are saying that the general case is to start with the hash of the first string and treat it as the h(0) and then feed in each character of the second string to make the new hash.
philcolbourn
@philcolbourn, just don't drop intermidiate object which calculate hash for first string and you'll be able to continue updating of that object to get hash of both strings. Most of interfaces to hashing functions I've seen was able to proceed further after generating _hash result_ for already feeded data.
ony
@ony, in these cases my intermediate object is the hash value, but is there a magic formula for the multiplicative hash that uses `xor`?
philcolbourn
@philcolbourn, see update to answer.
ony
@ony, note quite what I was hoping for, but you have answered the question. I think that len(abc) should be len(def) but in your example it does not matter since both are 3.
philcolbourn