ansaurus

Question

Question about GetHashCode implementation

Answer 1

+3 A:

return this.Id would usually be fine (especially if Id is immutable and unique) - the main idea is to avoid collisions. However, think also about pending data - what is the Id of the 27 rows you haven't saved yet?

Also note that the GetHashCode and Equals implementations must agree.

Marc Gravell 2009-02-05 15:19:45

Answer 2

+1 A:

Using this.Id is generally okay. The underpinning is that you don't want too many collisions which would end up in the same bucket. The bucket number is generally obtained by taking the hash code and considering it "mod x" where x is the number of buckets in your hash table, and is usually prime (or a probable prime).

If you're just using increasing IDs (1, 2, 3, 4...) that's going to end up being pretty random as far as the bucket distribution is concerned. It's only if your ID followed a pattern which might well give the same bucket number for lots of entries that you'd need to worry.

Jon Skeet 2009-02-05 15:24:39

Answer 3

A:

Seems poorly worded... I think they mean hashcodes should be "uniformly distributed" over all possible int values (.net experts please correct me if I'm wrong), which would help minimize collisions.

Here's an illustration: Suppose all my hashcodes were in the range 1 through 10. If I were to use the hashcode to compute an array index where the array has length 100, then I can only get at most 10 different indices. This means my array is poorly utilized, and I would get a lot of collisions.

Zach Scrivena 2009-02-05 15:26:30

Having lots of hashcodes in the same small range would be a problem - but [1-100] (each number once) is just as good as 100 values spread evenly across all values.

Jon Skeet 2009-02-05 15:28:07

I think that's true if you use the "low-order" bits as your indices (as in your answer). If I used the "high-order" bits, then I wouldn't do so well.

Zach Scrivena 2009-02-05 15:29:13

In fact, it could be better... with a dense distribution, you are likely to get reasonable distribution between buckets. If you get unlucky with a sparse (but equal-spaced) distribution, you could get all the data in one bucket. Very much a corner case, though - as it fixes once the buckets resize.

Marc Gravell 2009-02-05 15:30:20

@Marc: As in Jon's comments, I think this depends on how you convert hashcodes to indices.

Zach Scrivena 2009-02-05 15:31:57

Note that int.GetHashCode() is "return this;", and many things are keyed by int...

Marc Gravell 2009-02-05 15:32:27

@Marc: True, because there is no a priori distribution to speak of, so we can assume all int are equally likely, i.e. uniformly distributed.

Zach Scrivena 2009-02-05 15:42:21

Answer 4

A:

It could have an effect on eg. hashtables that hash into buckets depending on the high bits (not common). Also, if your ids are for example all divisible by four, this could make a hashtable that hashes into bucket hash_code%buckets use only every fourth bucket.

jpalecek 2009-02-05 15:29:12

Note that int.GetHashCode() is "return this;", and many things are keyed by int...

Marc Gravell 2009-02-05 15:31:57

Answer 5

A:

I prefer to use

this.Id.GetHashCode();

I think this makes it more likely that hashes will be distributed properly as opposed to using Id directly.

dbkk 2009-02-05 15:37:03

Well, it is likely that `Id` is an int - and int.GetHashCode() is `return this;`...

Marc Gravell 2009-02-05 15:42:21

@Marc, sure, but we're not supposed to care about implementation details of id's type. So, "id.GetHashCode()" and "id" are same in practice, different in principle.

dbkk 2009-02-07 05:46:43

ansaurus

tags:

views:

answers:

Question about GetHashCode implementation

related questions