views:

54

answers:

1

Guys, I have a data structure which has 25 distinct keys (integer) and a value. I have a list of these objects (say 50000) and I intend to use a hash table to store/retrieve them. I am planning to take one of these approaches.

  1. Create a integer hash from these 25 integer keys and store it on a hash table. (Yeah! I have some means to handle collisions)

  2. Make a string concatenation on the individual keys and use it as a hash key for the hash table. For example, if the key values are 1,2,4,6,7 then the hash key would be "12467".

Assuming that I have a total of 50000 records each with 25 distinct keys and a value, then will my second approach be a overkill when it comes to the cost of string comparisons it needs to do to retrieve and insert a record?

Some more information!

  1. Each bucket in the hash table is a balanced binary tree.
  2. I am using the boost library's hash_combine method to create the hash from the 25 keys.
A: 

Absolutely use the first method, because if you use the second , you will require a hash table which has 1x10^(25m), where x is the maximum length of a key slots available.

For example, if the maximum number a key can be is 9999, m would be 4 and you'd need 1x10^100 slots in your table.


Explanation:

The idea behind a hash table is that you can randomly access any element with an efficiency of O(1) (collisions aside) because any element's hash is infact its position in the hash table. So for example, if I hash Object X and a hash of 24 is returned (or some string hash which is converted to a number, which turns out to be 24), I simply go to slot 24 of my table (often implemented as an array), and can retrieve Object X.

But if you were using your second method (concatenating 25 numbers - we'll say digits to simplify things here - together to make the hash), the largest hash would be 9999999999999999999999999. Therefore to retrieve that object from the hash table, you'd have to retrieve it from position 9999999999999999999999999 - which means your table must have at least that many spots.


And remember, with the first one - since you're using a binary tree, collisions won't really be that big a deal. Worst case scenario will be a retrieval/insertion efficiency of O(log(n)) which isn't really that bad anyways.

Cam