views:

84

answers:

4

I have some sort of dictionary file that looks like this:

 UTM University of Tennessee at Martin
 UMD University of Maryland

It is a 3 letters acronym followed by the definition, separated by newlines. In total the file has 9282 definitions.

My questions are:

1) What would be the best way to store this definitions? Should I place them in a map, in a vector, store them in an array, leave them in a txt file and scan it for the acronym i need ? Other? Speed is key here. 2) Depending on your answer, what functions should i be using in order to lookup the acronym and then to retrieve just the definition?

Thanks in advance for your help.

EDIT / New Related Question: If i didn't want my application to depend on an external txt file, what would be the best way to do it?

+1  A: 

If speed is important a hash map seems like the best choice. There is one in Boost::Unordered. Otherwise a std::map is likely to work, too.

Your other options don't seem likely choices: Keeping the information in a text file and scanning when needed would be horribly slow (linear complexity + disk access). An unsorted vector would enable faster lookup, but why? You want a map, use one.

pmr
A: 

You should use std::map (#include <map>) to provide associative one-way lookup, with your key being the acronym and your value being the full name.

You can use insert to put your elements in, and operator[] to access them.

See this reference for more information.

Platinum Azure
+3  A: 

std::map is easy and is part of basic STL. It's probably your simplest option.

If speed is really really important, you can benchmark a few options:

  • use a hash table (tr1::hash_map, or boost::unordered_map) for O(1) lookup (it requires a hash).
  • using std::map O(log n) lookups
  • create a vector<string> (or vector<const char*>) with 26^3 elements (assuming acronyms are all letters A-Z), and convert the acronym into an index.

I'd guess that the vector option would be (by far) the fastest, but it's also the least obvious, hardest to maintain, and hardest to scale to larger datasets.

You can turn const char *acronym; into an index with something like this:

const char *vector_of_names[26*26*26];

// Input 3-letter acronym, outputs the associated name.
const char *getName(const char* acronym) {
  return vector_of_names[
      ((acronyms[0]-'A') * 26*26) +
      ((acronyms[1]-'A') * 26) +
       (acronyms[2]-'A')];
}
Stephen
Comprehensive answer. I would note `std::unordered_map` as well, for those who have access to C++0x, but it would be slower than the plain array.
Matthieu M.
+1  A: 

The fastest possible lookup is probably a perfect hash table which is built ahead of time. This would require more coding than the other solutions presented, so make sure you need it before you try it.

Mark Ransom