views:

234

answers:

2

Hi all,

I'm looking for a C++ container class, that's a lot like a multimap, but slightly different. The container would store pairs of strings. But when I retrieve items from the container using key K, I want to find all the items where K begins with the item's own key.

E.G. If I use key "abcde" I want to find items with key "adc" and "abcde", but not "abcqz".

Or in pseudo C++ form:

multimap2<string, string>  myMultiMap;
myMultiMap.insert( pair("abcde", "hello"));
myMultiMap.insert( pair("abc",   "Hi"));
myMultiMap.insert( pair("abcqz", "goodbye"));

// prints 2
cout << myMultiMap.count("abcde") << endl;

// prints "hello"  and  "Hi"
cout << myMultiMap.everything_which_matches("abcde") << endl;

// prints "Hi"
cout << myMultiMap.everything_which_matches("abc") << endl;

// prints "goodbye"
cout << myMultiMap.everything_which_matches("abcqz") << endl;

Insertion time is unimportant, but I need fast access to the items. Is it possible to do this with a normal Multimap by creating a special < operator? My hunch is that I would need the normal < operator for insertion, and a special one for retrieval.

thanks

Hugo

+11  A: 

I would suggest using a trie.

Basically you have a tree with 1 node per unique character. Your algorithm would be O(m) for both lookups and insertion, where m is the length of a string.

So following your example with:

"abcde", "hello" 
 "abc",  "Hi"
"abcqz", "goodbye"

Then you would have the following trie:

       a
       |
       b
       |
       c       (c holds data of hi)
     /  \
    d    q
    |    |
    e    z (z holds data of goodbye)    (e holds data of hello)

To do a lookup you simply start at the root node (root node not shown above), and follow the next char in your input string. Each time you reach a node that has a data result, you will include that as one of your output strings.

So a search for abcde would give you: "hi", "hello" as you wanted. It would not give you "goodbye" because you did not traverse over that result node.

Brian R. Bondy
+1  A: 

First, with std::multimap, you cannot have a different ordering for insertion and retrieval.

Second, any total ordering is not good enough for your purpose, which means it won't render the answer sets you want as intervals.

I would either search for all prefixes with one lookup each (you could optimize it by remembering the length of the next shorter prefix etc.) or use a Trie (but rather a PATRICIA trie which demands less space).

jpalecek