views:

254

answers:

8

I want to store strings and issue each with a unique ID number (an index would be fine). I would only need one copy of each string and I require quick lookup. I check if the string exist in the table often enough that i notice a performance hit. Whats the best container to use for this and how do i lookup if the string exist?

+8  A: 

I would suggest tr1::unordered_map. It is implemented as a hashmap so it has an expected complexity of O(1) for lookups and a worst case of O(n). There is also a boost implementation if your compiler doesn't support tr1.

#include <string>
#include <iostream>
#include <tr1/unordered_map>

using namespace std;

int main()
{
    tr1::unordered_map<string, int> table;

    table["One"] = 1;
    table["Two"] = 2;

    cout << "find(\"One\") == " << boolalpha << (table.find("One") != table.end()) << endl; 
    cout << "find(\"Three\") == " << boolalpha << (table.find("Three") != table.end()) << endl; 

    return 0;
}
CTT
+2  A: 

sounds like an array would work just fine where the index is the index into the array. To check if it exists, just make sure the index is in bounds of the array and that its entry isn't NULL.

EDIT: if you sort the list, you could always use a binary search which should have fast lookup.

EDIT: Also, if you want to search for a string, you can always use a std::map<std::string, int> as well. This should have some decent lookup speeds.

Evan Teran
I have a performance issue with arrays, i need logN
acidzombie24
wait so you want to be able to say exists("my_string")?
Evan Teran
I thought that you wanted to be able to do: C[index] to get the string.
Evan Teran
+5  A: 

Try std::map.

Brian Neal
+2  A: 

Are the Strings to be searched available statically? You might want to look at a perfect hashing function

Hemal Pandya
+1  A: 

Easiest is to use std::map.

It works like this:

#include <map>
using namespace std;

...

   map<string, int> myContainer;
   myContainer["foo"] = 5; // map string "foo" to id 5
   // Now check if "foo" has been added to the container:
   if (myContainer.find("foo") != myContainer.end())
   {
       // Yes!
       cout << "The ID of foo is " << myContainer["foo"];
   }
   // Let's get "foo" out of it
   myContainer.erase("foo")
antti.huima
As I commented on @Teran's answer: Wouldn't it be <int, std::string>? He wants to look up given the UID, not the string.
strager
he said he wants to lookup by a string in the comment to terans answer if i got him right
Johannes Schaub - litb
well he didnt... but we guess he did :)
Johannes Schaub - litb
My bad, I misread. Sorry. =X
strager
+3  A: 

First and foremost you must be able to quantify your options. You have also told us that the main usage pattern you're interested in is lookup, not insertion.

Let N be the number of strings that you expect to be having in the table, and let C be the average number of characters in any given string present in the said table (or in the strings that are checked against the table).

  1. In the case of a hash-based approach, for each lookup you pay the following costs:

    • O(C) - calculating the hash for the string you are about to look up
    • between O(1 x C) and O(N x C), where 1..N is the cost you expect from traversing the bucket based on hash key, here multiplied by C to re-check the characters in each string against the lookup key
    • total time: between O(2 x C) and O((N + 1) x C)
  2. In the case of a std::map-based approach (which uses red-black trees), for each lookup you pay the following costs:

    • total time: between O(1 x C) and O(log(N) x C) - where O(log(N)) is the maximal tree traversal cost, and O(C) is the time that std::map's generic less<> implementation takes to recheck your lookup key during tree traversal

In the case of large values for N and in the absence of a hash function that guarantees less than log(N) collisions, or if you just want to play it safe, you're better off using a tree-based (std::map) approach. If N is small, by all means, use a hash-based approach (while still making sure that hash collision is low.)

Before making any decision, though, you should also check:

Cheers V.

vladr
A: 

Google sparse hash maybe

Andrei Taranchenko
+1  A: 
Comptrol