ansaurus

Question

Need to store string as id for objects in some fast data structure

Answer 1

+1 A:

It is possible to make your own. But you shouldn't have any problems with boost or std::tr1::unordered_map.

A ternary trie may be faster than a hash map for a smaller number of elements.

Unknown 2009-05-08 04:56:59

I agree. It sounds like unordered_map (boost version at http://www.boost.org/doc/libs/1_37_0/doc/html/boost/unordered_map.html) is the way to go. If that's unacceptable, you should explain clearly /why/, and your ideas for a faster version.

Matthew Flaschen 2009-05-08 05:13:34

James since you're on Linux, if you're using a moderately recent version of gcc (v4.0 and above) it comes with <tr1/unordered_map>. No need to use Boost. #include <tr1/unordered_map> and then use std::tr1::unordered_map in place of std::map.

sstock 2009-05-08 06:50:42

Answer 2

+5 A:

I was going to suggest a map, but I see you have already ruled this out.

I tried using map but need something faster.

These are the std::map performance bounds courtesy of the Wikipedia page:

Searching for an element takes O(log n) time
Inserting a new element takes O(log n) time
Incrementing/decrementing an iterator takes O(log n) time
Iterating through every element of a map takes O(n) time
Removing a single map element takes O(log n) time
Copying an entire map takes O(n log n) time.

How have you measured and determined that a map is not optimised sufficiently for you? It's quite possible that any bottlenecks you are seeing are in other parts of the code, and a map is perfectly adequate.

The above bounds seem like they would fit within all but the most stringent scalability requirements.

LeopardSkinPillBoxHat 2009-05-08 04:58:37

+1 for recommending profiling to find the real performance bottleneck.

lothar 2009-05-08 16:57:30

Answer 3

A:

I think this question has already been posted and here is the link

Thunderboltz 2009-05-08 05:08:39

Answer 4

+2 A:

The type of data structure that will be used will be determined by the data you want to access. Some questions you should ask:

How many items will be in the session store? 50? 100000? 10000000000?
How large is each item in the store (byte size)?
What kind of string input is used for the key? ASCII-7? UTF-8? UCS2? ...

Hash tables generally perform very well for look ups. You can optimize them heavily for speed by writing them yourself (and yes, you can resize the table). Suggestions to improve performance with hash tables:

Choose a good hash function! this will have preferably even distribution among your hash table and will not be time intensive to compute (this will depend on the format of the key input).
Make sure that if you are using buckets to not exceed a length of 6. If you do exceed 6 buckets then your hash function probably isn't distributing evenly enough. A bucket length of < 3 is preferable.
Watch out for how you allocate your objects. If at all possible, try to allocate them near each other in memory to take advantage of locality of reference. If you need to, write your own sub-allocator/heap manager. Also keep to aligned boundaries for better access speeds (aligned is processor/bus dependent so you'll have to determine if you want to target a particular processor type).

BTrees are also very good and in general perform well. (Someone can insert info about btrees here).

I'd recommend looking at the data you are storing and making sure that the data is as small as possible. use shorts, unsigned char, bit fields as necessary. There are other additional ways to squeeze out improved performance as well such as allocating your string data at the end of your struct at the same time that you allocate the struct. i.e.

struct foo {
  int a;
  char my_string[0]; // allocate an instance of foo to be 
                     // sizeof(int) + sizeof(your string data) etc
}

You may also find that implementing your own string compare routine can actually boost performance dramatically, however this will depend upon your input data.

Adam Markowitz 2009-05-08 05:08:52

ansaurus

tags:

views:

answers:

Need to store string as id for objects in some fast data structure

related questions