views:

859

answers:

4

The JDK documentation for java.lang.String.hashCode() famously says:

The hash code for a String object is computed as

s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]

using int arithmetic, where s[i] is the ith character of the string, n is the length of the string, and ^ indicates exponentiation.

The standard implementation of this expression is:

int hash = 0;
for (int i = 0; i < length; i++)
{
    hash = 31*hash + value[i];
}
return hash;

Looking at this makes me feel like I was sleeping through my algorithms course. How does that mathematical expression translate into the code above?

+21  A: 

unroll the loop. Then you get:

int hash = 0;

hash = 31*hash + value[0];
hash = 31*hash + value[1];
hash = 31*hash + value[2];
hash = 31*hash + value[3];
...
return hash;

Now you can do some mathematical manipulation, plug in 0 for the initial hash value:

hash = 31*(31*(31*(31*0 + value[0]) + value[1]) + value[2]) + value[3])...

Simplify it some more:

hash = 31^3*value[0] + 31^2*value[1] + 31^1*value[2] + 31^0*value[3]...

And that is essentially the original algorithm given.

CookieOfFortune
You may want to explain it in terms of static single assignment (SSA) form, which then removes the need to think about what value "hash" has at any given point in time. :-)
Chris Jester-Young
Looks like the original algorithm says it should be:31^3*value[0] + 31^2*value[1] + 31^1*value[2] + ...Or is it just my fried brain misfiring?
Adnan
Actually, you are correct, I will make the edit.
CookieOfFortune
+8  A: 

Take a look at the first few iterations and you'll see the pattern start to emerge:

hash0 = 0 + s0 = s0
hash1 = 31(hash0) + s1 = 31(s0) + s1
hash2 = 31(hash1) + s2 = 31(31(s0) + s1) + s2 = 312(s0) + 31(s1) + s2
...
Bobby Eickhoff
<3 Thanks for (more or less) writing out CookieOfFortune's answer in SSA form. Much appreciated!
Chris Jester-Young
How do you do subscripts?
CookieOfFortune
Would be even better if you could vertically align all the corresponding terms, and distribute the 31(...) in the third line.
Nikhil Chelliah
@CookieOfFortune: There's an HTML tag for it. Look at the page source. I'd have used Unicode, though.
Nikhil Chelliah
+7  A: 

Proof by induction:

T1(s) = 0 if |s| == 0, else s[|s|-1] + 31*T(s[0..|s|-1])
T2(s) = s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
P(n) = for all strings s s.t. |s| = n, T1(s) = T2(s)

Let s be an arbitrary string, and n=|s|
Base case: n = 0
    0 (additive identity, T2(s)) = 0 (T1(s))
    P(0)
Suppose n > 0
    T1(s) = s[n-1] + 31*T1(s[0:n-1])
    T2(s) = s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1] = s[n-1] + 31*(s[0]*31^(n-2) + s[1]*31^(n-3) + ... + s[n-2]) = s[n-1] + 31*T2(s[0:n-1])
    By the induction hypothesis, (P(n-1)), T1(s[0:n-1]) = T2(s[0:n-1]) so
        s[n-1] + 31*T1(s[0..n-1]) = s[n-1] + T2(s[0:n-1])
    P(n)

I think I have it, and a proof was requested.

Devin Jeanpierre
oh snap! Induction!
John Gardner
+5  A: 

I'm not sure if you missed where it says "^ indicates exponentiation" (not xor) in that documentation.

Each time through the loop, the previous value of hash is multipled by 31 again before being added to the next element of value.

One could prove these things are equal by induction, but I think an example might be more clear:

Say we're dealing with a 4-char string. Let's unroll the loop:

hash = 0;
hash = 31 * hash + value[0];
hash = 31 * hash + value[1];
hash = 31 * hash + value[2];
hash = 31 * hash + value[3];

Now combine these into one statement by substituting each value of hash into the following statement:

hash = 31 * (31 * (31 * (31 * 0 + value[0]) + value[1]) + value[2])
     + value[3];

31 * 0 is 0, so simplify:

hash = 31 * (31 * (31 * value[0] + value[1]) + value[2])
     + value[3];

Now multiply the two inner terms by that second 31:

hash = 31 * (31 * 31 * value[0] + 31 * value[1] + value[2])
     + value[3];

Now multiply the three inner terms by that first 31:

hash = 31 * 31 * 31 * value[0] + 31 * 31 * value[1] + 31 * value[2]
     + value[3];

and convert to exponents (not really Java anymore):

hash = 31^3 * value[0] + 31^2 * value[1] + 31^1 * value[2] + value[3];
Laurence Gonsalves
RE your first sentence: Did you see some evidence that the question or a particular answer was assuming xor?
David Citron
You'd expressed confusion about how the code and the documentation could be equivalent. Since the documentation was using "^" for exponentiation, but Java normally uses it to mean bitwise xor I wondered if that was the source of your confusion. (There were no other answers when I started writing my answer, BTW)
Laurence Gonsalves
Ahh, I see. No, I was aware that it was exponentiation, but unclear on how the implementation followed from the mathematical expression.Your answer clarifies that greatly--but knowing to write that code given only that expression is still a leap for me. To arrive at that code, it would seem that you'd have to write out a small example, realize that you can "multiply by 0 in a clever way" in the innermost nesting to complete the pattern, then form the loop.
David Citron
It would not surprise me at all if the code actually came first, and the documentation was written afterwards.
Laurence Gonsalves