tags:

views:

685

answers:

5

After being taught how to create a hash table in class, I don't understand when hashing data would be useful. It seems to me that all hashing does is storing information in semi-random positions in an array. I want to know how any of the data can be made useful after it's stored.

My question is this: what are some examples where hashing information is beneficial? How is data retrieved in any organized manner? It seems to be placed in arbitrary positions where it would be difficult to retrieve.

+10  A: 

Hashing can be used for many purposes:

  1. It can be used to compare large amounts of data. You create the hashes for the data, store the hashes and later if you want to compare the data, you just compare the hashes.

  2. Hashes can be used to index data. They can be used in hash tables to point to the correct row. If you want to quickly find a record, you calculate the hash of the data and directly go to the record where the corresponding hash record is pointing to. (This assumes that you have a sorted list of hashes that point to the actual records)

  3. They can be used in cryptographic applications like digital signatures.

  4. Hashing can be used to generate seemingly random strings.

Here are the applications of hash functions that wikipedia lists:

  1. Finding duplicate records
  2. Finding similar records
  3. Finding similar substrings
  4. Geometric hashing

Now regarding hash table, here are some points to note:

If you are using a hash table, the hashes in the table should be in a sorted manner. If not, you will have to create an index on the hash column. Some implementations store the hash separately in a sorted manner and point to the original record.

If somebody is storing hashes in a semi-random order, it must be either because of the above reasons or because they just want to store the message digest of the information for comparing, finding duplicates etc. and not as an index to the data.

Niyaz
#1) If you look at some query plans in SQL Server, you'll see that hashing is very, very important for queries where there's not enough indexing to cover your columns.
Dave Markle
Dave, Thanks for adding that.
Niyaz
A: 

There are a couple of typical reasons to hash data. In the example you reference, you would hash the data and use that as a key to extract the actual value of the hashed item. The hashed data is often referred to as a key and it references a bucket where the actual, non-hashed value can be found.

The other typical reason is to create a signature of the hashed value so that you can check if the value has been changed by someone else. Since it's usually rare, depending on the algorithm used, to have two items hash to the same value, you can rehash a value and compare it to the saved hash value to check if the item is still the same.

tvanfosson
+6  A: 
Simucal
A: 

Hashing is a technique useful for fast key lookup. It allows one to more efficiently find a value rather than scanning a list from beginning to end.

mP
A: 

Have you ever used a dictionary or a set? They're typically implemented in terms of a hashtable because the value associated with a key can be found quickly.

{
'WA': 'Washington',
'WV': 'West Virginia',
'WY': 'Wyoming'
}
George V. Reilly