ansaurus

Question

Most Efficient way to 'look up' Keywords

Answer 1

+2 A:

A "trie" will surely be the most efficient way.

Ben Voigt 2010-09-21 04:21:39

Answer 2

+4 A:

If your set of keywords is fixed, a perfect hash can be built for O(1) lookup. Check out gperf or cmph.

ergosys 2010-09-21 04:26:15

You'd still have hash collisions with non-keywords, so I don't see this being more efficient than other methods. It's also not O(1), true the complexity doesn't depend on the number of keywords but it does depend on the length of each keyword.

Ben Voigt 2010-09-21 04:41:16

Verification after the lookup is a string compare, but this is unlikely to be an significant factor in the performance. Since the hash is perfect, there is no hash collision penalty, the input either matches the hashed slot or it doesn't, no additional search is needed.

ergosys 2010-09-21 04:56:02

Answer 3

+2 A:

Whatever implementation of std::map you have will probably be sufficient.

no one important 2010-09-21 05:16:59

Or `std::tr1::unordered_map` if your compiler supports it, which the latest VC++ and GCC both do. :)

Jonathan Grynspan 2010-09-21 05:42:21

Answer 4

A:

For singe character keywords a lookup table would be perfect. For multicharacter (especially if the lengths differs): a hash table. If you need performance, you could even use source code generation to create the hash tables (using a simple hash function that is able or not to ignore case, depending on your syntax).

So I'd implement it with a LUT and a hash table: first you check the first character with the LUT (if it's a simple operator, it would start with a non-alpha-numeric value), and, if not found, check the hash table.

ruslik 2010-09-21 06:22:08

Answer 5

A:

This is for a language, with a specific set of keywords that never change, and there aren't very many of them?

If so, it probably doesn't matter what you use. You will have bigger fish to fry.

However, since the list doesn't change, it would be hard to beat a hard coded search like this:

// search on first letter
switch(s[0]){
  case 'a':
    // search on 2nd letter, etc.
    break;
  case 'b':
    // search on 2nd letter, etc.
    break;
  ........
  case '_':
    // search on 2nd letter, etc.
    break;
}

Mike Dunlavey 2010-09-21 20:58:06

ansaurus

tags:

views:

answers:

Most Efficient way to 'look up' Keywords

related questions