views:

51

answers:

5

So there is this nice picture in the hash maps article on Wikipedia:

Phonebook hashmap

Everything clear so far, except for the hash function in the middle.

  • How can a function generate the right index from any string? Are the indexes integers in reality too? If yes, how can the function output 1 for John Smith, 2 for Lisa Smith, etc.?
+3  A: 

That's one of the key problems of hashmaps/dictionaries and so on. You have to choose a good hash function. A very bad but fast hash function could be the length of the keys. You instantly see, that you will get a lot of collisions (different keys, but same hash). Another bad hash function could be the ASCII value of the first character of your key. Lot's of collisions, too.
So you need a function that is a lot better than those two. You could add (xor) all ASCII values of the key characters and mix the length in for instance. In practice you often depend on the values (fields) of the object that you want to hash (same values give same hash => value type). For reference types you can mix in a memory location for instance.

In your example that's just simplified a lot. No real hash function would map these keys to sequential numbers.

Maybe you want to read one of my previous answers to hashmaps

tanascius
A: 

The hash function most often (but not necessarily always) outputs an integer within wanted range (often parameter to the hash function). This integer can be used as an index. Notice that hash function cannot be guaranteed to always produce unique result when given different data to hash. This is called hash collision and hash algorithm must always handle it in some way.

As for your specific question, how a string becomes a number. Any string is composed of characters (J, o, h, n ...) and characters can be interpreted as numbers (in computers). ASCII and UTF standards bind certain values to certain characters, so result is deterministic and always the same on all computers. So the hash function does operation on these characters that processes them as numbers and comes up with another number (output). You could for example simply sum all the values and use modulo operation to range-limit the resulting value.

This would be quite a horrible hashing function because for example "ab" and "ba" would get same result. Design of hash function is difficult and so one should use some ready-made algorithm unless situation dictates some other solution.

Pasi Savolainen
+1  A: 

A simple hash function may be as follows:

$hash = $string[0] % HASH_TABLE_SIZE;

This function will return a number between 0 and HASH_TABLE_SIZE - 1, depending on the first letter of the string. This number can be used to go to the correct position in the hash table.

A real hash function will consider all letters in a string, and it will be designed so that there is an even spread among the buckets.

Sjoerd
A: 

All that is required of a hash function is that it returns the same integer given the same key. Technically, a hash function that always returns '1' is not incorrect.

SP
A: 

There's a really good article on how hash functions (and colision detection/resolution) on MSDN:

Part 2: The Queue, Stack, and Hashtable

You can skip down to the header Compressing Ordinal Indexing with a Hash Function

There are some bits and pieces that are .NET specific (when they talk about which Hash algorithm .NET uses by default) but for the most part it is language agnostic.

Justin Niessner