What are hashtables and hashmaps and their typical use cases?

+14 A:

Well, think of it this way.

If you use an array, a simple index-based data structure, and fill it up with random stuff, finding a particular entry gets to be a more and more expensive operation as you fill it with data, since you basically have to start searching from one end toward the other, until you find the one you want.

If you want to get faster access to data, you typicall resort to sorting the array and using a binary search. This, however, while increasing the speed of looking up an existing value, makes inserting new values slow, as you need to move existing elements around when you need to insert an element in the middle.

A hashtable, on the other hand, has an associated function that takes an entry, and reduces it to a number, a hash-key. This number is then used as an index into the array, and this is where you store the entry.

A hashtable revolves around an array, which initially starts out empty. Empty does not mean zero length, the array starts out with a size, but all the elements in the array contains nothing.

Each element has two properties, data, and a key that identifies the data. For instance, a list of zip-codes of the US would be a zip-code -> name type of association. The function reduces the key, but does not consider the data.

So when you insert something into the hashtable, the function reduces the key to a number, which is used as an index into this (empty) array, and this is where you store the data, both the key, and the associated data.

Then, later, you want to find a particular entry that you know the key for, so you run the key through the same function, get its hash-key, and goes to that particular place in the hashtable and retrieves the data there.

The theory goes that the function that reduces your key to a hash-key, that number, is computationally much cheaper than the linear search.

A typical hashtable does not have an infinite number of elements available for storage, so the number is typically reduced further down to an index which fits into the size of the array. One way to do this is to simply take the modulus of the index compared to the size of the array. For an array with a size of 10, index 0-9 will map directly to an index, and index 10-19 will map down to 0-9 again, and so on.

Some keys will be reduced to the same index as an existing entry in the hashtable. At this point the actual keys are compared directly, with all the rules associated with comparing the data types of the key (ie. normal string comparison for instance). If there is a complete match, you either disregard the new data (it already exists) or you overwrite (you replace the old data for that key), or you add it (multi-valued hashtable). If there is no match, which means that though the hash keys was identical, the actual keys were not, you typically find a new location to store that key+data in.

Collision resolution has many implementations, and the simplest one is to just go to the next empty element in the array. This simple solution has other problems though, so finding the right resolution algorithm is also a good excercise for hashtables.

Hashtables can also grow, if they fill up completely (or close to), and this is usually done by creating a new array of the new size, and calculating all the indexes once more, and placing the items into the new array in their new locations.

The function that reduces the key to a number does not produce a linear value, ie. "AAA" becomes 1, then "AAB" becomes 2, so the hashtable is not sorted by any typical value.

There is a good wikipedia article available on the subject as well, here.

Lasse V. Karlsen 2008-09-26 08:25:12

Great answer - Thanks lassevk!

Kamil Zadora 2008-09-26 09:12:42

A:

You can find here all neccessary information.

2008-09-26 08:25:54

A:

Hashtables/hashmaps associate a value (called 'key' for disambiguation purposes) with another value. You can think them as kind of a dictionary (word: definition) or a database record (key: data).

ΤΖΩΤΖΙΟΥ 2008-09-26 08:26:37

+4 A:

if you are talking in terms of Java, both are collections which allow objects addition, deletion and updation and use Hasing algorithms internally.

The significant difference however, if we talk in reference to Java, is that hashtables are inherently synchronized and hence are thread safe while the hash maps are not thread safe collection.

Apart from the synchronization, the internal mechanism to store and retrieve objects is hashing in both the cases.

If you need to see how Hashing works, I would recommend a bit of googling on Data Structers and hashing techniques.

Nrj 2008-09-26 08:30:49

+7 A:

lassevk's answer is very good, but might contain a little too much detail. Here is the executive summary. I am intentionally omitting certain relevant information which you can safely ignore 99% of the time.

There is no important difference between hash tables and hash maps 99% of the time.

Hash tables are magic

Seriously. Its a magic data structure which all but guarantees three things. (There are exceptions. You can largely ignore them, although learning them someday might be useful for you.)

1) Everything in the hash table is part of a pair -- there is a key and a value. You put in and get out data by specifying the key you are operating on.

2) If you are doing anything by a single key on a hash table, it is blazingly fast. This implies that put(key,value), get(key), contains(key), and remove(key) are all really fast.

3) Generic hash tables fail at doing anything not listed in #2! (By "fail", we mean they are blazingly slow.)

When do we use hash tables?

We use hash tables when their magic fits our problem.

For example, caching frequently ends up using a hash table -- for example, let's say we have 45,000 students in a university and some process needs to hold on to records for all of them. If you routinely refer to student by ID number, then a ID => student cache makes excellent sense. The operation you are optimizing for this cache is fast lookup.

Hashes are also extraordinarily useful for storing relationships between data when you don't want to go whole hog and alter the objects themselves. For example, during course registration, it might be a good idea to be able to relate students to the classes they are taking. However, for whatever reason you might not want the Student object itself to know about that. Use a studentToClassRegistration hash and keep it around while you do whatever it is you need to do.

They also make a fairly good first choice for a data structure except when you need to do one of the following:

When Not To Use Hash Tables

Iterate over the elements. Hash tables typically do not do iteration very well. (Generic ones, that is. Particular implementations sometimes contain linked lists which are used to make iterating over them suck less. For example, in Java, LinkedHashMap lets you iterate over keys or values quickly.)

Sorting. If you can't iterate, sorting is a royal pain, too.

Going from value to key. Use two hash tables. Trust me, I just saved you a lot of pain.

Patrick McKenzie 2008-09-26 09:33:09

Lasse's answer is good but yours made more sense to me!

Tim 2009-07-16 21:02:24

ansaurus

tags:

views:

answers: