views:

1558

answers:

10

Say I have a method that needs to pull 8 values from a map with 100 elements in it. Which do you think would be preferable:

Walk in a for loop from begin to end once, pulling the elements out by switching on the key?

Or using find 8 times to get those values?

+3  A: 

I would use find 8 times. It will be less (and more obvious) code.

You should try not make performance judgements based on small numbers since (a) it isn't likely to be a performance bottleneck either way at this size and (b) the numbers may change in the future -- choose the algorithm with the best asymptotic performance.

Note: you mention 'switching' on the key ... that may apply in your case, but in general you can't switch on the key value in a map. Allowing for this would make the code to searching for M items in a map by iteration even more complex.

Rob Walker
+9  A: 

Walking the list will take you O(n) time to find a random element.

Map is a balanced binary tree, so doing a find is O(log n).

Thus doing 8 finds results in 8*log2(n) and walking the list is (n). The larger the list, the larger the gains, but in all random cases doing finds will be faster than doing iterations.

In non-random cases if there is reason to thing the items you want are near each other in the tree, or near the "begining" (left side) then walking/iterating would be faster. But that seems unlikey.

SoapBox
Note that STL map is typically *not* a height-balanced tree. It's usually implemented as Red-Black, which IIRC means that the depth of a node in the worst case is 2*log2(n), although the typical case is better. Not that this really affects the analysis in this case.
Steve Jessop
Red-Black trees are "self balancing" trees.
SoapBox
Correct. That means there's a limit on how unbalanced they can be, It doesn't mean their height never exceeds log(n).
Steve Jessop
A: 

Here is the analysis for the time complexity of them (n is the item count in the map), which is guaranteed to do the lookup for find with logarithmic or better time complexity:

8 * log2 n  for 8 times find
n for the iterate through all

The first one is bigger for smaller numbers (8 for n=2 for example), but at around 43, the first one will become better than the second one and stays so. So, you will want to use the first method, given that it also is more convenient to code.

Johannes Schaub - litb
std::map is not necessarily balanced. Worst-case depth is proportional to log(n) but not necessarily equal to log(n). And log(0) isn't 1, it's undefined ;-)
Steve Jessop
yeah indeed. i'm sorry i wanted to write "logarithmic". dumb litb. that was too late in the evening
Johannes Schaub - litb
i hope you don't think i'm writing nonsense all the time. just when i'm getting tired. lol
Johannes Schaub - litb
A: 

You should use find 8 times.

Think of the switch approach as taking each node and comparing it 8 times. That's 800 comparisons, and you lose all benefit of the map being keyed at all, it might as well be a list.

With the find approach, you traverse the list using the benefits of the map. I believe std::maps are implemented as binary trees, meaning searching for a key will only require comparing your key down to the depth of the tree, which will be 8~ for a 100 element binary tree. Now, you can find all 8 elements with only 8*8 comparisons, or 64 comparisons.

bradtgmurray
std::map is a binary tree, but is not necessarily balanced. Worst-case depth is proportional to log(n), but not necessarily equal to log(n).
Steve Jessop
A: 

Thanks all, confirms my thinking.

+5  A: 

While I'd go with the find option, people put too much stress on asymptotic performance.

The fact is that asymptotic performance is a handy guide for algorithms that can receive reasonably large inputs, but even then it isn't foolproof. It's quite possible for an algorithm with worse asymptotic performance than another to be faster for any reasonable input.

And then there are times when your input size is known to be fairly small (or even fixed). In such cases asymptotic performance is almost meaningless.

Artelius
+1  A: 

8 finds is best, because the code's simpler and clearer.

Thinking about performance is more fun, though, so I'll do that too.

As Artelius said while I was writing this answer, ignore the complexity. It's not relevant because you know that n=100. For example, insertion sort has worse algorithmic complexity than quicksort (at least in the average case), but in almost all implementations, insertion sort is faster than quicksort for small n and so quicksorts break out to insertion sort towards the end. Your n is also small, so the limit as n -> infinity isn't what matters.

Since the code for both options is simple to write, I'd suggest profiling it. This will (a) tell you which is faster, and (b) prove that both are so fast that it doesn't matter which you do (unless it's the only thing your program does, and it does it a lot). Especially since you talk about switching on the key - if the key is an integral type then the limiting factor is more likely to be memory cache issues than any actual processing.

However failing that, usually the way to compare searching algorithms is to count the comparisons, on the assumption that they're much slower than traversing the structures. If nothing else, each comparison accesses memory and is an unpredictable branch, which are two things CPUs are often worst at.

If you sort your 8 elements before you start (which takes 24 comparisons or so) instead of the switch you propose, then because the map is also sorted, you only have to do one comparison at each node you traverse, plus one comparison per item you're searching for (compare one node from each "side". If they match increment both sides. If they don't match, increment the side with the smaller element). So that's 8+100 in the worst case, or less if you find all 8 before you get to the end. But the average position of the last of 8, if they're randomly located in the map, is still something like 8/9 of the way through. So call it 8+88+24 = 120 comparisons, with 132 as the worst case. The best case is 8 (if the things you're searching for are all at the start) +24 (for the initial sort) = 32 comparisons, or even better if you get lucky on the sort as well or your 8 search items are ready-sorted for some reason.

The average depth of a node in a Red-Black tree (which map usually is) is slightly over log2(n). Call it 7 in this case since 2^7 is 128. So finding 8 elements should take something like 56 comparisons. IIRC the balance characteristic of a Red-Black tree is that the deepest node is no more than twice the depth of the shallowest. So the worst case depth is floor(2*log2(n)), call it 15, for a total of 120, and the best is ceil(1/2 log2(n)), which is 4. That's 32 comparisons again.

So assuming that comparisons are the only thing that affects speed, the 8 finds will be somewhere between 4 times as fast, and 4 times as slow, as the linear traversal, with a 2x better average.

The linear traversal will probably touch more memory, though, so might be slower on that account. But ultimately for n=100 you're talking millisecond times, so just do whatever's the simplest code (probably the 8 finds). And did I mention that if you really want to know the speed you can't hope to predict, you just have to profile it?

Steve Jessop
A: 

If it's that critical, I would implement both and benchmark the performance.

In theory it's whether

8 * lg(100) >?< 100

Other considerations are if either of those numbers will ever change -- will it ever be more than 100 elements; will you ever do more than 8 searches?

JohnMcG
+1  A: 

As others have noted, I would probably just use find() on a map eight times and be done with it. But there's an alternative to consider depending on your needs. If the items in the map aren't going to change much after the map is constructed, or you don't need to interleave insertions with lookups, you might try just keeping the key/value pairs in a sorted vector. If you do this, then you can use the lower_bound() function to do a binary search in logarithmic time. This has the benefit that if the keys that you are looking for can be ordered (and you know that they'll always be present) then you can use the iterator returned from the previous lookup as the lower bound for the next. e.g.,

vector::iterator a = my_map.lower_bound( my_map.begin(), my_map.end(), "a" );
vector::iterator b = my_map.lower_bound( a + 1, my_map.end(), "b" );
vector::iterator c = my_map.lower_bound( b + 1, my_map.end(), "c" );
// ...

Both approaches have logarithmic lookup, but this can help reduce the constant somewhat.

Boojum
A: 

Let's assume "find" bails when it finds the key.

Let's further assume that you code the "switch" sensibly, and it quits checking after it finds a match. We will also assume you don't bother to code it to bail on the whole process once all 8 have been found (that would probably be a pain to code up).

The "8 find" approach can expect (iow: on average) to perform 50 * 8 = 400 comparisons.

The "switch" approach can expect (iow: on average) to perform (8 * 100) - 28 = 772 comparisons.

So in terms of comparisons, the 8 finds approach is better. However, the number of comparisons is small enough that you'd be better off just going with the option that is easier to understand. That will probably be the 8 find approach too though.

T.E.D.