ansaurus

Question

Isn't 'int GetHashCode' a bit short-sighted?

Answer 1

+5 A:

The Int64 hash function is there to make sure that all the bits are considered - so basically it is XORing the top 32 bits with the bottom 32 bits. I can't really imagine a better general-purpose one. (Truncating to Int32 would be no good - how could you then properly hash 64-bit values which had all zeros in the lower 32 bits?)

If IntPtr were used as the hash return value, then code would have to have conditional branches (is it 32-bit? is it 64-bit? etc), which would slow down the hash functions, defeating the whole point.

I would say that if you have a hashtable which actually has 2 billion buckets, you're probably at the stage of writing an entire custom system anyway. (Possibly a database would be a better choice?) At that size, making sure the buckets were filled evenly would be a more pressing concern. (In other words, a better hash function would probably pay more dividends than a larger number of buckets).

There would be nothing to stop you implementing a base class which did have an equivalent 64-bit hash function, if you did want a multi-gigabyte map in memory. You'd have to write your own Dictionary equivalent however.

stusmith 2010-01-14 14:31:55

+1 for pragmatism

kdgregory 2010-01-14 14:35:39

yes I understand that ^ing them together makes sure all the bits get considered - makes a lot of sense. Interestingly, if you look at IntPtr - used for things like method handles - it simply truncates to an int. That's great if you've got handles to data in the upper 32 bits of memory and you're using them for keys!I take your point about conditional branching - you couldn't make the 32-bit/64-bit hash transparent to the code that generates it. I also take your point about writing a new data structure to store more data - which I guess is where you would ultimately have to take it.

Andras Zoltan 2010-01-14 22:11:43

Answer 2

+3 A:

You do realize that the hash code returned by GetHashCode is used for addressing in a hash table? Using a bigger data type would be a futile exercise since all hash tables are smaller anyway. Additional information would simply be wasted because it cannot be used adequately.

Common hash tables have in the order of a few thousand to a few million entries. A 32 bit integer is more than sufficient to cover this range of indices.

Konrad Rudolph 2010-01-14 14:36:43

Well - that's not quite true - a hashcode of 2,034,242,111 doesn't get used as an index . Unlike an array, there is nothing other memory restricts a hashtable in size - theoretically there's no reason it couldn't have 10 billion elements, even with a 32 bit hash. It's just memory restrictions. Bring on a machine with couple of hundred gigs of ram (okay, let's say a terabyte) and we could fill it with such a huge hashtable. Wheher you would - or create some other structure, however - is another story!

Andras Zoltan 2010-01-14 22:08:01

@Andras: how is that any different from a normal array (hint: it isn’t). And yes, you *could* have 10 billion elements – just as with a normal array but that just doesn’t scale on *any* current architecture. Convolving the whole .NET architecture for the *one* machine worldwide that can handle 1 TeB of main memory doesn’t sound like a good trade-off to me. The point is: architectures necessarily involve trade-offs and doubling the size of an address may be a big deal.

Konrad Rudolph 2010-01-15 14:43:22

ansaurus

tags:

views:

answers:

Isn't 'int GetHashCode' a bit short-sighted?

related questions