ansaurus

Question

Is String.hashCode() portable across VMs, JDKs and OSs?

Answer 1

+7 A:

No. From http://tecfa.unige.ch/guides/java/langspec-1.0/javalang.doc1.html:

The general contract of hashCode is as follows:

Whenever it is invoked on the same object more than once during an execution of a Java application, hashCode must consistently return the same integer. The integer may be positive, negative, or zero. This integer does not, however, have to remain consistent from one Java application to another, or from one execution of an application to another execution of the same application. [...]

John Millikin 2008-10-10 07:14:19

Wrong. This applies to java.lang.Object, not java.lang.String. The latter further constrains the specification to the use of a specific implementation.

Gili 2010-02-12 22:58:53

Answer 2

A:

I would like to add that you can override hashCode() (don't forget equals() if you do that) to make sure your business objects return the same hashCode everywhere. Those objects will then at least have a predictable hashCode.

extraneon 2008-10-10 07:38:31

You don't need to override equals if you override hashCode, although I've no idea why you want to.

Tom Hawtin - tackline 2008-10-10 08:05:34

Answer 3

+2 A:

It depends on the type:

If you've got a type which hasn't overridden hashCode() then it will probably return a different hashCode() each time you run the program.
If you've got a type which overrides hashCode() but doesn't document how it's calculated, it's perfectly legitimate for an object with the same data to return a different hash on each run, so long as it returns the same hash for repeated calls within the same run.
If you've got a type which overrides hashCode() in a documented manner, i.e. the algorithm is part of the documented behaviour, then you're probably safe. (java.lang.String documents this, for example.) However, I'd still steer clear of relying on this on general principle, personally.

Just a cautionary tale from the .NET world: I've seen at least a few people in a world of pain through using the result of string.GetHashCode() as their password hash in a database. The algorithm changed between .NET 1.1 and 2.0, and suddenly all the hashes are "wrong". (Jeffrey Richter documents an almost identical case in CLR via C#.) When a hash does need to be stored, I'd prefer it to be calculated in a way which is always guaranteed to be stable - e.g. MD5 or a custom interface implemented by your types with a guarantee of stability.

Jon Skeet 2008-10-10 08:02:33

Dave Cheney 2008-10-12 14:37:02

Dave: In Java 1.1, the docs didn't specify an algorithm for String.hashCode, so it wasn't safe to rely on it, and it was acceptable for it to be changed in Java 1.2. The algorithm is now explicitly documented - breaking it would be violating documented behaviour. (Continued)

Jon Skeet 2008-10-13 07:21:04

It can be relied upon as much as any other documented behaviour: if we can't trust that the documented behaviour of methods will not change between API releases, we're pretty much doomed.

Jon Skeet 2008-10-13 07:21:47

Answer 4

A:

No. Hash algorithms are not guaranteed, unless otherwise specified. So for instance, deserialisation of hash structures need to recalculate hash codes, and these values should not be stored in the serialised form.

Tom Hawtin - tackline 2008-10-10 08:08:13

Answer 5

+2 A:

According to the docs: the hash code for a String object is computed as

s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]

I am not certain whether this is a formal specification or just Sun's implementation. At the very least, it should be the same on all existing Sun VMs, regardless of platform or operating system.

Michael Myers 2008-10-10 15:24:23

ansaurus

tags:

views:

answers:

Is String.hashCode() portable across VMs, JDKs and OSs?

related questions