views:

51

answers:

4

I am just starting to learn hashtables, and so far, I know that you take the object you want to hash and put it through an hash function, then use the index it returns to get the corresponding object you want. There is something I don't understand though:

What structure do you use to store the objects in so you can quickly index them with the code returned by the hash function? The only thing I can think of is to use an array, but to handle all the keys, you'd have to allocate one that's 9999999999999 elements big or something ridiculous like that. Or is it as simple as iterating over a linked list or something and comparing the ID in each of the elements with the key from that hash function? And if so, that seems kindof inefficient doesn't it?

A: 

Generally it's array. If the array size is N then use hash function that returns numbers in range 0..(N-1). For example apply modulo N on the hash function result. And then use collision resolution in Wikipedia.

Drakosha
I'm not asking about collision resolution, I was asking what type of data structure the things are stored in and how to quickly index them.
Langley
A: 

There is a plenty kinds of hashing, enjoy: http://en.wikipedia.org/wiki/Hashtable

Edit:

Since I can't comment on other posts (?!), Just wanted to say the chain hashing using linked lists isn't the only way to handle collusions, you could easily think of appending at each place an AVL Tree or Even a Hash Table !!

What is more, using open addressing and rehashing is one of the most common uses and what about Universal group of hash functions. Since he worried for efficiency knowing all aspects of hasing is required.

LmSNe
+2  A: 

Normally, you use an array (or something similar like a vector). You pick a reasonable size (e.g., 20% larger than the number of items you expect) and some method of resolving collisions when/if two keys produce the same hash value (e.g., each of those locations is the head of a linked list of items that hashed to that value).

Jerry Coffin
A: 

Yes, you usually use an array but then you do a couple of things:

  1. You convert the hash code to an array index by using the remainder of the hash code divided by the array size.

  2. You make the size of the array a prime number as that makes step #1 more efficient (some hash algorithms need this to get a uniform distribution)

  3. You come up with a design to handle hash collisions. @JerryCoffin's answer gives you more detail.

R Samuel Klatchko