tags:

views:

127

answers:

2

I have an API where various types have custom hash codes. These hash codes are based on getting the hash of a string representation of the object in question. Various salting techniques are used so that as far as possible Hash Codes do not collide and that Objects of different types with equivalent string representations have different Hash Codes.

Obviously since the Hash Codes are based on strings there are some collisions (infinite strings vs the limited range of 32 bit integers). I use hashes based on string representations since I need the hashes to persist over sessions and particularly for use in database storage of objects.

Suddenly today my code has started generating different hash codes for Objects which is breaking all kinds of things. It was working earlier today and I haven't touched any of the code involved in Hash Code generation.

I'm aware that the .Net documentation allows for implementation of hash codes between .Net framework versions to change (and between 32 and 64 bit versions) but I haven't changed the framework version and there has been no framework updates recently as far as I can remember

Any ideas because this seems really weird?

Edit

Hash Codes are generated like follows:

//Compute Hash Code
this._hashcode = 
   (this._nodetype + this.ToString() + PlainLiteralHashCodeSalt).GetHashCode();
A: 

You say that you use this hashcode for persistence. This is a bad idea with your current implementation, because you use the ToString() function to generate the hashcode. The result of this function is not connected to persistence, and maybe a developer needs to change it for GUI design or whatever reasons and forgets, that it is used for persistence, too.
In your case I'd look at the result of the ToString() method, maybe it changed. This can happen by changing the culture or by moving an object to another namespace - just have a look, maybe you find a reason.

tanascius
+1  A: 

What StampedeXV is suggesting in his comment is that Object.ToString() will return the fully qualified name by default, if ToString() is not overriden.

  1. Changing the namespace (or class name) would change this value if ToString() is not overriden.
  2. Obviously, overriding ToString() would change it.
  3. Check exactly how and where is _nodeType modified.
  4. PlainLiteralHashCodeSalt remains a mistery (I presume it's a constant string).
  5. Nobody guarantees that String.HashCode() will not change, so you can at least use Reflector to get the methods' source and include it in your library. This is not something I would usually recommend, but you don't want to depend on this in the future.

Needless to say, you should trace all 3 values (_nodeType, this.ToString() and salt string) to check that they haven't changed. If you can revert to an older revision which works, you are half way there.

Apart from that, persisting a hash code is not recommended. If this is performance related, note that it is your database's responsibility to take care of indexing and hashing. And since you cannot guarantee it to be unique, then it's also not a GUID. So what's the point then?

But since it's already in the database, your primary concern now is how to get the HashCode implementation back.

Groo
It is performance related and I know the risks associated with it but combined with other aspects of the persistence approach it makes significant order of magnitudes in performance difference. Node Type is assigned once in the base constructor, PlainLiteralHashCodeSalt is a constant string
RobV
Accepted the answer because looking into things I realised that the problem was that I was using an old database that had been populated prior to my change in hash code generation (which was a couple of months ago) which is why the Hash Codes appeared invalid i.e. the values used for those three things were different at the time (hits head on desk)
RobV