ansaurus

Question

Efficiently finding multiple items in a container

Answer 1

+5 A:

If you're purely doing look-up (you don't require ordered elements) and can give up some memory, try unordered_map (it's TR1, also implemented in Boost), which has constant-time amortized look-up.

In a game engine, we tested std::map and unordered_map, and while map was faster for insertions (if I recall), unordered_map blew it out of the water for retrieval. We had greater than 1000 elements in the map, for scale, which is fairly low compared to some other tasks you may be doing.

If you require elements to be ordered, your next bet is std::map, which has the look-up times you've posted, and keeps the elements ordered. In general, it also uses less memory than an unordered_map.

GMan 2009-12-29 23:36:43

Answer 2

A:

boost/tr1 unordered_map and unordered_set are containers backed by a hash table which gives you search in amortized contant time [ O(1) ]

Boost Unordered documentation.

drspod 2009-12-29 23:37:16

Answer 3

A:

I suppose if you have a sorted container and a uniform distribution of items then the most efficient type of method would be a recursive bisection search with an execution path somewhat like a tree - calling itself twice whenever all the objects being searched for are in both halves of the bisection.

However, if you choose a container based on a hash-table (boost unordered set, I think?), or something similar, then lookup can be O(1), so searching in a loop really doesn't matter.

EDIT: note that std::map and std::set are normally (always?) implemented using rb-trees, so are only log(n) for lookup.

Autopulated 2009-12-29 23:37:26

The way `map` and co. are implemented is an implementation detail. The standard only says how the structure should perform asymptotically. Commonly, yes, it's a red-black tree, but implementors are free to use whatever they see fit.

GMan 2009-12-30 00:04:41

I thought as much, but I've never seen one that wasn't a red-black tree :)

Autopulated 2009-12-30 00:16:01

That's because red-black tree's are sexy. :P

GMan 2009-12-30 00:25:23

Answer 4

+4 A:

Hash tables have basically O(1) lookup. This gives you O(m) to lookup m items; obviously you can't lookup m items faster than O(m) because you need to get the result out.

antti.huima 2009-12-29 23:38:34

Answer 5

+3 A:

If your container is a vector and the elements are sorted, you can use std::lower_bound to search in O(log n) time. If your search items are also sorted, you can do a small optimization by always using the last found iterator as the start of the search for the next one, e.g.

vector<stuff> container;
vector<stuff>::iterator it = container.begin();
for (int i = 0;  i < search_items.size() && it != container.end();  ++i)
{
    it = std::lower_bound(it, container.end(), search_items[i]);
    // make sure the found item is a match
    if (it != container.end() && search_items[i] < *it)
        it = container.end(); // break out early
}
if (it != container.end())  // found it!

Mark Ransom 2009-12-30 00:03:09

I completely forgot about this optimization. It is really good if you pop your target keys into a set and iterate over them and the map at the same time.

D.Shawley 2009-12-30 00:14:53

Answer 6

+1 A:

Are you sure that m log₂(n) is actually going to be a problem? If you are using a std::map that is even relatively large, the number of actually comparisons is still pretty small - if you are looking up 10,000 elements in a map of 1,000,000, the number of comparisons should be about 200,000 or about 20 comparisons per target element. This really isn't bad if your key is just a simple integer.

If you were hashing something that didn't already have a nice key, then I would say go with boost::unordered_map. I would implement it with std::map first, profile it, and then decide if you want to make the next jump to Boost.

D.Shawley 2009-12-30 00:12:10

Make sure to look at Mark's answer about using your last position as a start for searching, this is a really nice optimization.

D.Shawley 2009-12-30 00:14:17

Answer 7

A:

If you're frequently performing the same projections on your collection, such as extracting elements with a key of "42", you could consider maintaining these subsets in buckets. You'd internally maintain a hashmap from keys to vectors of elements with that key, and add elements to the appropriate bucket as well as your primary collection representing "everything". Extracting a subgroup of elements is constant time (because the respective collections have already been built), and the memory overhead of maintaining the buckets scales primarily with the number of unique keys in your dataset.

This technique is decidedly less effective if you have a large number of unique key values, and makes insertions and removals more expensive, but it's good for some situations- I thought it was at least worth mentioning.

JohnE 2009-12-30 15:04:43

ansaurus

tags:

views:

answers:

Efficiently finding multiple items in a container

related questions