tags:

views:

411

answers:

6

Are there any considerations for immutable types regarding hash codes?

Should I generate it once, in the constructor?

How would you make it clear that the hash code is fixed? Should I? If so, is it better to use a property called HashCode, instead of GetHashCode method? Would there be any drawback to it? (Considering both would work, but the property would be recommend).

+3  A: 

I would generate the hash code once when getHashCode is called the first time, then cache it for later calls. This avoids calling it in the constructor when it may not be needed.

If you don't expect to call getHashCode very many times for each value object, you may not need to cache the value at all.

Bill the Lizard
This seems to be the surefire solution if you do want caching.
mquander
@mquander: Yes, but only if you're absolutely certain that you do want it. Jon makes a good case against it in his answer for some very common cases.
Bill the Lizard
also note that this approach would require either: an additional bit to store that the hascode was not constructed yet, a fixed value of the hash (say 0) is remapped to some other value or if the hash generated is the 'null' value then accept it won't be cached.All have costs and complexities compared to the simpler calculate on construction that may/may not offset the delayed calculation costs. Swings and Roundabouts is perf optimizations. Don't unless you know you need it.
ShuggyCoUk
+1  A: 

Why do you need to make sure that the hashcode is fixed? The semantics of a hashcode are that it will always be the same value for any given state of an object. Since your objects are immutable, this is a given. How you choose to implement GetHashCode is us up to you.

Having it be a private field that is returned is one choice - it's small, easy, and fast.

plinth
+8  A: 

I wouldn't normally generate it in the constructor, but I'd also want to know more about the expected usage before deciding whether to cache it or not.

Are you expecting a small number of instances, which get hashed an awful lot and which take a long time to calculate the hash? If so, caching may be appropriate. If you're expecting a large number of potentially "throw-away" instances, I wouldn't bother caching.

Interestingly, .NET and Java made different choices for String in this respect - Java caches the hash, .NET doesn't. Given that many string instances are never hashed, and those which are hashed are often only hashed once (e.g. on insertion into the hash table) I think I favour .NET's decision here.

Basically you're trading memory + complexity against speed. As Michael says, test before making your code more complex. Of course in some cases (e.g. for a class library) you can't accurate predict the real-world usage, but in many situations you'll have a pretty good idea.

You certainly don't need a separate property though. Hash codes should always stay the same unless someone changes the state of the object - and if your type is immutable, you're already prohibiting that, therefore a user shouldn't expect any changes. Just override GetHashCode().

Jon Skeet
Thanks Jon. My case is actually having these types in really large numbers, like millions. So I should just calculate the hash code everytime GetHashCode is called, right?
Joan Venge
The System.String hashcode implementation is bad anyway, good that .NET doesn't cache it.
Rauhotz
Yup, it sounds like calculating the on the fly is the way to go. Four bytes per instance can mount up when you have a lot of instances.
Jon Skeet
(Assuming it's otherwise a fairly small type, and that the hashcode doesn't take long to implement, admittedly.)
Jon Skeet
Yeah most of the base types I use are structs. I don't use an expensive hashcode.
Joan Venge
+2  A: 

Well, you've got to have a GetHashCode() overridden method, as that's how consumers are going to retrieve your hashcode. Most hashcodes are fairly simple arithmetic operations, that will execute quickly. Do you have a reason to believe that caching the results (which has a memory cost) will give you a noticeable performance improvement?

Start simple - generate the hashcode on the fly. If you think you'll see performance improvements caching it, test first.

Regulations require me to refer you to the "premature optimization is the root of all evil" quote at this point.

Michael Petrotta
+16  A: 

Are there any considerations for immutable types regarding hash codes?

Immutable types are the easiest types to hash correctly; most hash code bugs happen when hashing mutable data. The most important thing is that hashing and equality agree; if two instances compare as equal, they should have the same hash code. (The reverse is not necessarily true; two instances that have the same hash need not be equal.)

Should I generate it once, in the constructor?

That's a performance optimizing technique; by doing so, you trade increased consumption of space (for the storage of the computed value) for a possible decrease in time. I never make performance optimizations unless they are driven by realistic, customer-focused performance tests that carefully measure the performance of both options against documented goals. You should do this if your carefully-designed experiments indicate that (1) failure to do so causes you to miss your goal, and (2) doing so causes you to meet your goal.

How would you make it clear that the hash code is fixed?

I don't understand the question. A changing hash code is the exception, not the rule. Hash codes are always supposed to be unchanging. If the hash code of an object changes then the object can get "lost" in a hash table, so everyone should assume that hash codes remain stable.

is it better to use a property called HashCode, instead of GetHashCode method?

What consumer of your object is going to say "well, I could call GetHashCode(), a method guaranteed to be on all objects, but instead I'm going to call this HashCode getter that does exactly the same thing" ? Do you have such a consumer in mind?

If you don't have any consumers of functionality, then don't provide the functionality.

Eric Lippert
+2  A: 

I know from my personal experience that developers are really good at misjudging performance issues.

So it it recommended to keep everything as simple as possible while calculating hash code on the fly in the GetHashCode().

Rinat Abdullin