tags:

views:

115

answers:

6

Hi All,

I'm working on a high performance application where all calls must be justified. I have a map that is used once in the beginning of each transaction to do a lookup that I would like to improve upon. The map is loaded at startup and does not change after that.

The key in the map below is an std::string but it can it changed to a char array if needed. C or C++ as a solution is fine.

  typedef stdext::hash_map<std:string, int> symbols_t;

Does anyone know of any other solutions that would eliminate the lookup or be faster?

Thanks ahead of time for your help.

Additional info from edits:
1. The hash_map currently has 350,000 elements.
2. Each key value is typically between 4 and 10 characters in length.
3. Information is received on a callback from a third party API. The callback is given a symbol that is used as the key value when doing a the map lookup. The rest of the software is driven off of the int returned from the map lookup.

THANKS: Thanks all for your input. You've given me a few avenues to explore. I will definitely try these out. I appreciate the help.

+1  A: 

I would say we lack of information here to reliably tell you what to do.

You may want to be more specific about what the lookup is for and the overall algorithmic cost of your functions.

If you clutter the code with ugly hacks to win 1 constant microsecond in a function whose algorithmic cost is O(n²) where it could be O(n), you're wasting your time on the wrong problem.

Without additional details, we can't really tell.

ereOn
@ereOn - I added some additional info. Hope it helps and it is enough :)
skimobear
+1  A: 

Hand-code a hash-map that is more tuned to your data.

  1. simplistic hash function that is good enough
  2. use a sparse C-array that is big enough to not have collisions for your data
  3. make sure all calls are inlined
  4. Make sure you never copy or convert strings
  5. Write code to generate the C-source for this C array. It's going to look like (using 0 for no entry):

    int symbols[] = { 0,0,0,0,0,0,5,0,0,0,0,0,3,0,0,0,0,0,0,2 /* etc */ };
    

    The code you write can search for a hash function where there are no collisions for your data. Perhaps it's something as simple as the first two-chars of the symbol (or first 4) as an int. If you don't care about space, you don't need to make a perfect hash for all possible data, just a fast one that's perfect for the data you have.

The array index is simple_hash(string& s)

Remember that if you change the symbols, you may have to rewrite the hash and certainly need to regenerate the table.

EDIT: based on @blaze's answer -- the code in #5 is written for you and is called gperf

Lou Franco
+1  A: 

If you really do need a hash_map keyed on strings, then you could try customizing the hash function. If your strings are mostly unique in (say) the first four characters, then write a custom hash function that only looks at up to the first four characters in a string, and make the hash_map use that. Here's an example:

struct CustomStringHash: std::unary_function<std::string, size_t>
{
    size_t operator()(const std::string & s) const
    {
         switch (s.size())
         {
              case 0:
                   return 0;
              case 1:
                   return s[0] + 1;
              case 2:
                   return (s[0] << 8) + s[1];
              default: //3 or more chars long, plus a terminating null
                   return *reinterpret_cast<const uint32_t *>(s.c_str());
         }
    }

If your strings are 8-12 characters on average, and mostly unique in the first four characters, then customizing the hash function could speed up lookups quite significantly.

Doug
+1  A: 

How can we advise you how to eliminate your lookup since you don't tell us what you look up or why? We'd need far more algorithmic detail.

As for performance, whether or not to use a hash_map depends on some of the complexity. Hashmaps have (if you have a good implementation, realistically) O(1) lookup, insertion. But the constant overhead can be quite high. If you have low numbers of entries you could suffer here and might benefit from a std::map. You could also suffer from cache coherence problems if many different elements of the map are accessed frequently and could consider some sort of sorted array instead.

DeadMG
@DeadMG - added some additional info above. pls let me know if its not enough. thx
skimobear
+2  A: 

Is this map completely constant or changes between program invocations? For constant hashes (known at compile time) there is gperf program, which generates fast and guaranteed O(1) lookup table.

Also, it could help to understand you problem if you tell us why and how exactly map lookups slow down your code.

blaze
The contents of the hash_map changes on a daily basis. It is pulled out of a database each morning. That sounds interesting, I'll take a look :)
skimobear
gperf generates C++ source files that are hardcoded with your data. Use gperf to create a dynamic library from your database that you unload and load each morning.
Lou Franco
+1  A: 

A hash table is usually fast enough O(1) and we cannot tell you if you can get rid of the hash table without knowing the whole structure of your application. It may not be possible.

I don't know how is implemented stdext::hash_map<std::string,T> , but a prefix tree is a possibly better solution. It is equivalent to a hash table with a perfect hash function.

      s
      |
      t
    /   \
   o     a
   |     |
(p,42)   r
         |
       (t,69)

It will give you the value corresponding to your string in O(1) maximum 10 iterations (max length of the string) and will minimize the space cost of storing keys.

Ugo