views:

471

answers:

1

I often auto-generate an class's hashCode() method using IntelliJ IDEA and typically the method takes the form:

result = 31 * result + ...

My question is what is the purpose of multiplying by 31? I know this is a prime number but why pick 31 specifically? Also, if implementing a hashCode() for a particularly small / large dataset would people approach this problem differently?

+14  A: 

Multiplying by 31 is fast because the JIT can convert it to a shift left by 5 bits and a subtract:

x * 31 == (x << 5) - x

Without any particular extra information, I'd stick to this approach. It's reasonably fast and likely to end up with reasonably well-distributed hash codes, and it's also easy to get right :)

The size of the dataset doesn't really matter, but if you have particular extra information about the values you'll be work with (e.g. "it's always even") then you may be able to design a better hash function. I'd wait until it's an actual problem first though :)

Jon Skeet
Then why not 7? That's a shift by 3 and a subtract. And it's a prime
oxbow_lakes
Thanks Jon. If this is the reason it's strange that IDEA doesn't just put (x << 5) - x in the generated code. Is the JIT likely to actually spot this optimisation?
Adamski
7 allows strings that differ only in two adjacent characters to often end up with the same hash code. In fact pretty much any processor from the last decade or two should be able to manage a multiply by an eight bit number (so long as it is in a register) in a cycle.
Tom Hawtin - tackline
Last time I checked 31 was prime, too.
Bombe
@Adamski yes it is likely to spot it (assuming it's even in a critical path). I'd be shocked if any professional-grade compiler in the last decade can't spot and optimize this sort of thing.
Hank Gay
@Jon Skeet: Do you have an idea, why _Effective Java_ suggests to use `result = 37 * result + ...` (see http://java.sun.com/developer/Books/effectivejava/Chapter3.pdf chapter 3, page 38)? And also, why we can't simply use the left shift `result = (result << 5) + ...` and don't care about subtraction? Why actually a prime is suggested? Any mathematical proof of that? (relative discussions are here http://stackoverflow.com/questions/113511/hash-code-implementation/113600#113600 and there http://stackoverflow.com/questions/2333363/hashcode-comparison-problem/2333425#2333425)
dma_k
@dma_k: I'm afraid I don't know the details of it... only that it's meant to work well. (I thought Effective Java suggested 31 actually... maybe it's the second edition which does so?)
Jon Skeet