Why does C# not implement GetHashCode for Collections?

views:

306

answers:

+4 Q:

Why does C# not implement GetHashCode for Collections?

I am porting something from Java to C#. In Java the hashcode of a ArrayList depends on the items in it. In C# I always get the same hashcode from a List...

Why is this?

For some of my objects the hashcode needs to be different because the objects in their list property make the objects non-equal. I would expect that a hashcode is always unique for the object's state and only equals another hashcode when the object is equal. Am I wrong?

Why is too philosophical. Create helper method (may be extension method) and calculate hashcode as you like. May be XOR elements' hashcodes

Andrey 2010-05-25 18:30:20

+3 A:

It is not possible for a hashcode to be unique across all variations of most non-trivial classes. In C# the concept of List equality is not the same as in Java (see here), so the hash code implementation is also not the same - it mirrors the C# List equality.

Yishai 2010-05-25 18:32:45

+7 A:

Yes, you are wrong. In both Java and C#, being equal implies having the same hash-code, but the converse is not (necessarily) true.

See GetHashCode for more information.

BlueRaja - Danny Pflughoeft 2010-05-25 18:38:29

+2 A:

You're only partly wrong. You're definitely wrong when you think that equal hashcodes means equal objects, but equal objects must have equal hashcodes, which means that if the hashcodes differ, so do the objects.

CPerkins 2010-05-25 18:52:32

@Thomas - thanks for the edit.

CPerkins 2010-05-25 19:18:57

+4 A:

In order to work correctly, hashcodes must be immutable – an object's hash code must never change.

If an object's hashcode does change, any dictionaries containing the object will stop working.

Since collections are not immutable, they cannot implement GetHashCode.
Instead, they inherit the default GetHashCode, which returns a (hopefully) unique value for each instance of an object. (Typically based on a memory address)

SLaks 2010-05-26 13:16:14

+1 - only answer to mention dictionaries and immutable hashcodes

Steve Dennis 2010-05-26 13:38:13

Wish I could -1. If the Hashcode of a mutable object does not change, it will violate the equals contract, if two objects change from different (=hashcode doesn't matter) to equal (=hashcode absolutely must be the same, though it was most likely not the same before the change).It is in the responsibility of the coder to make sure that a mutable object does not change at all, while its hashcode is in use. If an object that is contained in a hashed dictionary is changed, this is a bug - regardless of the hashcode implementation.

Alex 2010-07-22 14:17:52

@Alex: I meant that a mutable object should not have a hashcode, period.

SLaks 2010-07-22 14:27:01

I did understand, but that's not necessary - a mutable object may have a hashcode, but then it is necessary to never change it while the hashcode is used. And that's easy, too: readonly flag in "GetHashcode", ClearReadOnly method, in all writing methods and properties: "if (readonly) { throw new ReadonlyException(); }". This way, you are forced to think and explicitly call ClearReadOnly(), before an object can be changed. It is easier to have hashed objects immutable, but not necessary. Haviong to list mutable hashless objects is a very good performance killer.

Alex 2010-07-22 15:08:40

The core reasons are performance and human nature - people tend to think about hashes as something fast but it normally requires traversing all elements of an object at least once.

Example: If you use a string as a key in a hash table every query has complexity O(|s|) - use 2x longer strings and it will cost you at least twice as much. Imagine that it was a full blown tree (just a list of lists) - oops :-)

If full, deep hash calculation was a standard operation on a collection, enormous percentage of progammers would just use it unwittingly and then blame the framework and the virtual machine for being slow. For something as expensive as full traversal it is crucial that a programmer has to be aware of the complexity. The only was to achieve that is to make sure that you have to write your own. It's a good deterrent as well :-)

Another reason is updating tactics. Calculating and updating a hash on the fly vs. doing the full calculation every time requires a judgement call depending on concrete case in hand.

Immutabilty is just an academic cop out - people do hashes as a way of detecting a change faster (file hashes for example) and also use hashes for complex structures which change all the time. Hash has many more uses beyong the 101 basics. The key is again that what to use for a hash of a complex object has to be a judgement call on a case by case basis.

Using object's address (actually a handle so it doesn't change after GC) as a hash is actually the case where the hash value remains the same for arbitrary mutable object :-) The reason C# does it is that it's cheap and again nudges people to calculate their own.

ZXX 2010-07-31 00:50:12

ansaurus

tags:

views:

answers:

Why does C# not implement GetHashCode for Collections?

related questions