ansaurus

Question

Critique this C# Hashmap Implementation?

Answer 1

+5 A:

A Remove method should never throw an exception. You are trying to remove an item. No harm is done if it have already been removed. All collection classes in .Net uses bool as a return value to indicate if an item was really removed.
Do not throw Exception, throw specific one. Browse through all exceptions in the Collection namespaces to find suitable ones.
Add a TryGetValue
Use KeyValuePair which already is a part of .Net instead of creating your own.
Add a constructor which can define map size.
When throwing exceptions include details to why it was thrown. For instance, instead of writing "This key exists", write string.Format("Key '{0}' already exists", key)

jgauffin 2010-09-06 17:24:17

Answer 2

A:

Sorry to say this, but this class won't be working as HashMap or even simple dictionary.

First of all, value returned from GetHashCode() is not unique. Two different objects, e.g. two strings, can possibly return same hash code value. The idea to use hash code as the array index then simply leads to record loss in case of hash code clashing. I would suggest reading about GetHashCode() method and how to implement it from MSDN. Some obvious example is if you get hash code of all possible Int64 values starting at 0, the hash code will surely be clashed at some point.

Another thing is, the for-loop lookup is slow. You should consider using binary search for look up. To do so, you must maintained your key-value pair sorted by the key at any time, which imply that you should use List instead of array for the storage variable so when adding new key-value pair you can insert it at the appropriate index.

After all, make sure that when you are coding for real hash map, you realized that hash code can be the same for different keys, and never do the look up with for-loop from 0 to len-1.

tia 2010-09-06 17:34:55

Isn't that what chaining is for?

Dr.HappyPants 2010-09-06 17:46:59

@tia. This is precisely how GetHashCode() is intended to be used. I would suggest you read about the GetHashCode() method a bit more yourself. Hash maps of arbitary objects (rather than those for which a perfect hash can be generated because they are from a known set) always have a risk of hash collision, for which there are various techniques such as reprobing and chaining (the querant is using chaining here). The person implementing GetHashCode() should always avoid collision, but the person using it must deal with the fact that collision can still happen.

Jon Hanna 2010-09-06 18:03:18

You might want to look at the code and why this has a -1.

Dr.HappyPants 2010-09-08 15:56:13

Yes my bad and I must apologize that I skimmed your code too lightly. My above comment is completely inaccurate.

tia 2010-09-08 18:53:27

Answer 3

+3 A:

Your hash method is of a fixed range. This means that a single item could cause 214748 buckets to be created (if it's hashcode rehashed to 214747). A more commonly used (and almost always better approach) is to start with an initial size that is either known (due to knowledge of the domain) to be big enough for all values or to start small and have hashmap resize itself as appropriate. With re-probing the obvious measure of a need to resize is how much reprobing was needed. With chaining as you are experimenting with here, you'll want to keep both average and maximum chain sizes down. This keeps down your worse-case lookup time, and hence your average lookup time closer to the best-case O(1).

The two most common approaches to such hashing (and hence to initial table size) is to either use prime numbers or powers of two. The former is considered (though there is some contention on the point) to offer better distribution of keys while the latter allows for faster computation (both cases do a modulo on the input-hash, but with a number known to be a power of 2, the modulo can be quickly done as a binary-and operation). Another advantage of using a power of two when you are chaining, is that its possible to test a chain to see if resizing the hash would actually cause that chain to be split or not (if you have an 8-value table and there's a chain whose hashes are all either 17, 1 or 33 then doubling the table size would still leave them in the same chain, but quadrupling it would re-distribute them).

You don't have a method offering replace semantics, which is usual with .NET dictionary types (where adding will error if there's already an item with that key, but assigning to an index won't).

Your error on a retrieval that would try to go beyond the number of buckets will make no sense to the user, who doesn't care whether the bucket existed or not, only the key (they need not know how your implementation works at all). Both cases where a key isn't found should throw the same error (System.Collections.Generic.KeyNotFoundException has precisely the right semantics, so you could reuse that.).

Using a List is rather heavy in this case. Generally I'd frown on anyone saying a BCL collection was too heavy, but when it comes to rolling your own collections, its generally either because (1) you want to learn from the exercise or (2) the BCL collections don't suit your purposes. In case (1) you should learn how to complete the job you started, and in case (2) you need to be sure that List doesn't have whatever failing you found with Dictionary.

Your removal both throws a nonsensical error for someone who doesn't know about the implementation details, and an inconsistent error (whether something else existed in that bucket is not something they should care about). Since removing a non-existent item isn't harmful it is more common to merely return a bool indicating whether the item had been present or not, and let the user decide if that indicates an error or not. It is also wasteful in continuing to search the entire bucket after the item has been removed.

Your implementation does now allow null keys, which is reasonable enough (indeed, the documentation for IDictionary<TKey, TValue> says that implementations may or may not do so). However, the way you reject them is by having the NullReferenceException caused by trying to call GetHashCode() on null be returned, rather than checking and throwing a ArgumentNullException. For the user to receive a NullReferenceException suggests that the collection itself was null. This is hence a clear bug.

Jon Hanna 2010-09-06 17:59:10

ansaurus

tags:

views:

answers:

Critique this C# Hashmap Implementation?

related questions