views:

202

answers:

2

There seems to be an ongoing debate about whether it is safe to rely on the current implementation of String.hashCode() because, technically speaking, it is guaranteed by the specification (Javadoc).

  1. Why did Sun specify String.hashCode()'s implementation in the specification?
  2. Why would developers ever need to rely upon a specific implementation of hashCode()?
  3. Why is Sun so afraid that the sky will fall if String.hashCode() is changed in the future? (This is probably be explained by #2)
+7  A: 

A reason for relying on the specific implementation of hashCode() would be if it is ever persisted out into a database, file or any other storage medium. Bad Things(tm) would happen if the data was read back in when the hashing algorithm had changed. You could encounter unexpected hash collisions, and more worryingly, the inability to find something by its hash because the hash had changed between the data being persisted and "now".

In fact, that pretty much explains point #3 too =)

The reason for point #1 could be "to allow interoperability". If the hashCode implementation is locked down then data can be shared between different implementations of Java quite safely. i.e, the hash of a given object will always be the same irrespective of implementation.

Rob
Good point! I wonder... could they have achieved the same thing without locking down hashCode()?
Gili
@Gili, not without adding a method called "implementationAndVersionIndependentHashCode()" ;-)
Rob
@Gili if they did not lock down hashCode, how could they be certain that two machines connected via RMI could pass hashes back and forth? My guess is that you just have to give up the concept of a shared hash.
Bill K
@Rob and Bill: couldn't each program (i.e. a specific DB or RMI implementation) provide their own locked-down hash function as opposed to locking it down in the platform?
Gili
@Gili, yeup, of course they could - but what exactly would be the benefit? :)
Rob
@Rob, the benefit would be that the specification would be free of implementation-specific details and only applications that needed to lock down hashcode (a minority in the grand scheme of things) would inherit the limitations of locking down hashcode.
Gili
@Gili, if it's in the spec then it's implicitly not an implementation-detail :) The long and the short of it is that, IMO, making the hashcode algorithm part of the spec for a class as fundamental as "String" is entirely the correct thing to do. Your opinion may vary :)
Rob
@Rob, Fair enough. Thanks ;)
Gili
+3  A: 

The implementation has changed since the original String class. If I recall, it used to be that only every 16th (?) character was used in the hash for "long" strings.

It may have been specified to promote serialization interoperability between subsequent versions of Java, or even between the runtimes of different vendors. I agree, a programmer should not rely on a particular implementation of hashCode() directly, but changing it could potentially break a lot of serialized collections.

erickson
The original specification was to throw an `ArrayOUtOfBoundsException`. :) IIRC, the implementation for long strings sampled a fixed number of characters, so O(1) instead of O(n) but a bad hash and using the string for anything useful would be (at least) O(n) anyway.
Tom Hawtin - tackline