tags:

views:

301

answers:

5

An interesting issue came up recently. We came across some code that is using hashCode() as a salt source for MD5 encryption but this raises the question: will hashCode() return the same value for the same object on different VMs, different JDK versions and operating systems? Even if its not guaranteed, has it changed at any point up til now?

EDIT: I really mean String.hashCode() rather than the more general Object.hashCode(), which of course can be overridden.

+7  A: 

No. From http://tecfa.unige.ch/guides/java/langspec-1.0/javalang.doc1.html:

The general contract of hashCode is as follows:

  • Whenever it is invoked on the same object more than once during an execution of a Java application, hashCode must consistently return the same integer. The integer may be positive, negative, or zero. This integer does not, however, have to remain consistent from one Java application to another, or from one execution of an application to another execution of the same application. [...]
John Millikin
Wrong. This applies to java.lang.Object, not java.lang.String. The latter further constrains the specification to the use of a specific implementation.
Gili
A: 

I would like to add that you can override hashCode() (don't forget equals() if you do that) to make sure your business objects return the same hashCode everywhere. Those objects will then at least have a predictable hashCode.

extraneon
You don't need to override equals if you override hashCode, although I've no idea why you want to.
Tom Hawtin - tackline
+2  A: 

It depends on the type:

  • If you've got a type which hasn't overridden hashCode() then it will probably return a different hashCode() each time you run the program.
  • If you've got a type which overrides hashCode() but doesn't document how it's calculated, it's perfectly legitimate for an object with the same data to return a different hash on each run, so long as it returns the same hash for repeated calls within the same run.
  • If you've got a type which overrides hashCode() in a documented manner, i.e. the algorithm is part of the documented behaviour, then you're probably safe. (java.lang.String documents this, for example.) However, I'd still steer clear of relying on this on general principle, personally.

Just a cautionary tale from the .NET world: I've seen at least a few people in a world of pain through using the result of string.GetHashCode() as their password hash in a database. The algorithm changed between .NET 1.1 and 2.0, and suddenly all the hashes are "wrong". (Jeffrey Richter documents an almost identical case in CLR via C#.) When a hash does need to be stored, I'd prefer it to be calculated in a way which is always guaranteed to be stable - e.g. MD5 or a custom interface implemented by your types with a guarantee of stability.

Jon Skeet
Dave Cheney
Dave: In Java 1.1, the docs didn't specify an algorithm for String.hashCode, so it wasn't safe to rely on it, and it was acceptable for it to be changed in Java 1.2. The algorithm is now explicitly documented - breaking it would be violating documented behaviour. (Continued)
Jon Skeet
It can be relied upon as much as any other documented behaviour: if we can't trust that the documented behaviour of methods will not change between API releases, we're pretty much doomed.
Jon Skeet
A: 

No. Hash algorithms are not guaranteed, unless otherwise specified. So for instance, deserialisation of hash structures need to recalculate hash codes, and these values should not be stored in the serialised form.

Tom Hawtin - tackline
+2  A: 

According to the docs: the hash code for a String object is computed as

s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]

I am not certain whether this is a formal specification or just Sun's implementation. At the very least, it should be the same on all existing Sun VMs, regardless of platform or operating system.

Michael Myers