tags:

views:

324

answers:

10

I don't have experience with hash tables outside of arrays/dictionaries in dynamic languages, so I recently found out that internally they're implemented by making a hash of the key and using that to store the value. What I don't understand is why aren't the values stored with the key (string, number, whatever) as the, well, key, instead of making a hash of it and storing that.

A: 

In some cases, it's possible that the key is very long or large, making it impractical to keep copies of these keys. Hashing them first allows for less memory usage as well as quicker lookup times.

Aaron
The key is always stored in the hashmap as well. Otherwise you couldn't handle hash collisions (because you couldn't check whether the passed-in key is really equal to the stored key). So you don't save memory.
sepp2k
+1  A: 

Generally the point of a hash table is to store some sparse value -- i.e. there is a large space of keys and a small number of things to store. Think about strings. There are an uncountable number of possible strings. If you are storing the variable names used in a program then there is a relatively small number of those possible strings that you are actually using, even though you don't know in advance what they are.

Joel
Technically, there are countably infinite possible strings of finite length. ;)
Mike Daniels
Of course, silly me :-)
Joel
A: 

A hashtable is used to store a set of values and their keys in a (for some amount of time) constant number of spots. In a simple case, let's say you wanted to save every integer from 0 to 10000 using the hash function of i % 10.

This would make a hashtable of 1000 blocks (often an array), each having a list 10 elements deep. So if you were to search for 1234, it would immediately know to search in the table entry for 123, then start comparing to find the exact match. Granted, this isn't much better than just using an array of 10000 elements, but it's just to demonstrate.

Hashtables are very useful for when you don't know exactly how many elements you'll have, but there will be a good number fewer collisions on the hash function than your total number of elements. (Which makes the hash function "hash(x) = 0" very, very bad.) You may have empty spots in your table, but ideally a majority of them will have some data.

Jon
+3  A: 

This is a near duplicate: Why do we use a hashcode in a hashtable instead of an index?

Long story short, you can check if a key is already stored VERY quickly, and equally rapidly store a new mapping. Otherwise you'd have to keep a sorted list of keys, which is much slower to store and retrieve mappings from.

BobMcGee
A: 

Also consider speed. If your key is a string and your values are stored in an array, your hash can access any element in 'near' constant time. Compare that to searching for the string and its value.

dbrien
+1  A: 

The idea of a hash table is to provide a direct access to its items. So that is why the it calculates the "hash code" of the key and uses it to store the item, insted of the key itself.

The idea is to have only one hash code per key. Many times the hash function that generates the hash code is to divide a prime number and uses its remainer as the hash code.

For example, suppose you have a table with 13 positions, and an integer as the key, so you can use the following hash function

f(x) = x % 13

Carlos Loth
+2  A: 

What I don't understand is why aren't the values stored with the key (string, number, whatever) as the, well, key

And how do you implement that?

Computers know only numbers. A hash table is a table, i.e. an array and when we get right down to it, an array can only addressed via an integral nonnegative index. Everything else is trickery. Dynamic languages that let you use string keys – they use trickery.

And one such trickery, and often the most elegant, is just computing a numerical, reproducible “hash” number of the key and using that as the index.

(There are other considerations such as compaction of the key range but that’s the foremost issue.)

Konrad Rudolph
A: 

The main advantage of using a hash for the purpose of finding items in the table, as opposed to using the original key of the key-value pair (which BTW, it typically stored in the table as well, since the hash is not reversible), is that..

...it allows mapping the whole namespace of the [original] keys to the relatively small namespace of the hash values, allowing the hash-table to provide O(1) performance for retrieving items.

This O(1) performance gets a bit eroded when considering the extra time to dealing with collisions and such, but on the whole the hash table is very fast for storing and retrieving items, as opposed to a system based solely on the [original] key value, which would then typically be O(log N), with for example a binary tree (although such tree is more efficient, space-wise)

mjv
A: 

What I don't understand is why aren't the values stored with the key (string, number, whatever) as the, well, key, instead of making a hash of it and storing that.

Well, how do you propose to do that, with O(1) lookup?

The point of hashtables is basically to provide O(1) lookup by turning the key into an array index and then returning the content of the array at that index. To make that possible for arbitrary keys you need

  1. A way to turn the key into an array index (this is the hash's purpose)
  2. A way to deal with collisions (keys that have the same hash code)
  3. A way to adjust the array size when it's too small (causing too many collisions) or too big (wasting space)
Michael Borgwardt
A: 

In a nutshell: Hashing allows O(1) queries/inserts/deletes to the table. OTOH, a sorted structure (usually implemented as a balanced BST) makes the same operations take O(logn) time.

Why take a hash, you ask? How do you propose to store the key "as the key"? Ask yourself this, if you plan to store simply (key,value) pairs, how fast will your lookups/insertiongs.deletions be? Will you be running a O(n) loop over the entire array/list?

The whole point of having a hash value is that it allows all keys to be transformed into a finite set of hash values. This allows us to store keys in slots of a finite array (enabling fast operations - instead of searching the whole list you only search those keys that have the same hash value) even though the set of possible keys may be extremely large or infinite (e.g. keys can be strings, very large numbers, etc.) With a good hash function, very few keys will ever have the same hash values, and all operations are effectively O(1).

This will probably not make much sense if you are not familiar with hashing and how hashtables work. The best thing to do in that case is to consult the relevant chapter of a good algorithms/data structures book (I recommend CLRS).

MAK