views:

437

answers:

3

I need to count a lot of different items. I'm processing a list of pairs such as:

A34223,34
B23423,-23
23423212,16

What I was planning to do was hash the first value (the key) into a 32bit integer which will then be a key to a sparse structure where the 'value' will be added (all start at zero) number and be negative.

Given that they keys are short and alphanumeric, is there an way to generate a hash algorithm that is fast on 32bit x86 architectures? Or is there an existing suitable hash?

I don't know anything about the design of hashes, but was hoping that due to the simple input, there would be a way of generating a high performance hash that guarantees no collision for a given key length of "X" and has high dispersion so minimizes collisions when length exceeds "X".

+7  A: 

As you are using C++, the first thing you should do is to create a trivial implimentation using std::map. Is it fast enough (it probably will be)? If so, stick with it, otherwise investigate if your C++ implementation provides a hash table. If it does, use it to create a trivial implementation, test, time it. Is it fast enough (almost certainly yes)?

Only after you hav eexhausted these options should you think of implementing your own hash table and hashing function.

anon
Thanks. You're right. I should try trivial one first. The hashing piece is a separate feature in the program which is reasonably OK performance wise. So long as this doesn't add more than 33% to the run time, I should be OK.
+1  A: 

A guarantee for no collisions is difficult.

In your case, the keys

A34223
B23423
23423212

can be translated to 32-bit integers with little effort.

And here is a good function that generate hashes from strings:

/**
 *  "The Practice of Programming", Hash Tables, section 2.9, pg. 57
 *
 *  computes hash value of string
 */
DWORD
strhash( char* str )
{
  //#define MULTIPLIER 31 or 37
  unsigned int   h;
  unsigned char* p;

  h = 0;
  for ( p=(unsigned char*)str; *p != '\0'; p++ )
    h = 31 * h + *p; // <- FIXED MULTIPLIER

  return h;
}
Nick D
+1  A: 

Check Bob Jenkin's website for good hash functions. IIRC it is the same hash that is used in Perl.

florin