views:

561

answers:

9

In programming we face various situations where we are required to make use of intermediate STL containers as the following example depicts:

while(true)
{
    set < int > tempSet;

    for (int i = 0; i < n; i ++)
    {
        if (m.size() == min && m.size() <= max)
        {
            tempSet.insert(i);
        }
    }
    //Some condition testing code
}

Or

set < int > tempSet;

while(true)
{
    for (int i = 0; i < n; i ++)
    {
        if (m.size() == min && m.size() <= max)
        {
            tempSet.insert(i);
        }
    }
    tempSet.clear();

    //Some condition testing code
}

Which approach is better in terms of time and space complexity considering the present state of C++ compliers?

+2  A: 

The second may be slightly better time-wise, but the difference will be extremely minimal -- the code still has to go through and release each of the items in the set, regardless of whether it's re-creating it or clearing it.

(Space-wise they'll be identical.)

Head Geek
A: 

As a general rule, there is a memory allocation cost associated with loading data into a container. It is far better to avoid paying that cost many times, by reusing a container. If you actually know how many items, or can make a good guess, you should preallocate space in the container up front.

It is particularly noticeable if you are inserting objects into the container, because you can end up paying lots of extra construction/destruction costs, as the container realizes that it is too small, reallocates memory, and copy constructs new objects into the new memory based on the existing objects.

Always preallocate and reuse. It saves time and memory.

EvilTeach
I don't think you can preallocate space in a set.
Head Geek
Yes, I see that. You probably would have to do something goofy with the allocator then to get a decent preallocation.
EvilTeach
I ran a test. the max_size is huge. It is not an issue in this case.
EvilTeach
+4  A: 

If this is a question of set/map/list, it won't make any difference. If it is a question of vector/hash_set/hash_map/string, the second will either be faster or the same speed. Of course, this difference in speed will only be noticeable if you are doing a very large number of operations (10,000 ints pushed into a vector 10,000 times was about twice as fast - 3 seconds and some change vs. 7 seconds.).

If you are doing something like storing a struct/class in your data structure instead of a pointer to one, this difference will be even greater because on each resize, you have to copy all of the elements.

Again, in almost every case, it won't matter - and in the cases where it does matter, it will show up if you are optimizing, and you care about every bit of performance.

hazzen
This question is about sets, not vectors. Also for vectors, I debate about the noticing the difference in speed: vectors are supposed to grow exponentially (usually double each time), so the number of allocations are amortised.
Chris Jester-Young
The question isn't specific to either. If you take the time to actually run some speed tests for the case in which is does matter, you will notice a large performance hit. It isn't the allocation that hurts, it is the memcpy after the realloc.
hazzen
+7  A: 

I say the first one is more bug-proof. You're not going to have to remember to clear it (it'll just go out of scope and be done). :-)

Performance-wise there isn't much of a difference, but you should nevertheless measure both cases and see for yourself.

Chris Jester-Young
Follow up on the bug-proof comment to the OP: Imagine what will happen when adding a continue statement somewhere in the while-loop.
dalle
Yep, totally agree! :-)
Chris Jester-Young
A: 

There is very little difference, although your second one avoids calling the constructor and destructor multiple times. On the other hand your first one is one line shorter and guarantees that the container is not visible outside the scope of the loop.

da_code_monkey
+12  A: 

The first version is correct. It is simpler in almost every way. Easier to write, easier to read, easier to understand, easier to maintain, etc....

The second version may be faster, but then again it may not. You would need to show that it had a significant advantage before using it. In most non-trivial cases I would guess that there would not be a measurable performance difference between the two.

Sometimes in embedded programming it is useful to avoid putting things on the stack; in that case the second version would be correct.

Use the first version by default; use the second only if you can give a good reason for doing so (if the reason is performance, then you should have evidence that the benefit is significant).

ejgottl
I agree: don't make variables visible where they're not needed, unless with good reason.
xtofl
+1  A: 

There will be very little performance difference between the two. You should make your decision based on code readability and robustness.

I think the first example is more readable and safer - what happens in six months time when you add a condition to the loop, perhaps with a continue in there somewhere - in the second example, it will bypass the clear() and you will have a bug.

Adam Pierce
+1  A: 

You are putting your set on the stack, therefor allocation cost is almost zero!

ypnos
The allocation cost for the set object itself is near zero, but the items within it are still allocated from the heap.
Head Geek
@Head Geek: The construction/destruction of the objects within the set are the same in either example - the only difference is the construction/destruction of the set object itself. So ypnos's point is valid.
Michael Burr
@Mike B: Sorry, but I don't see your point. As I read it, ypnos is talking about allocation cost -- the time it takes to allocate memory for the object -- so my objection to it still stands.
Head Geek
I'm talking about the allocation cost of the set object. The insertion of items, and therefore their memory (de)allocation, is the same in both examples! Using clear() effectively does the same as calling deconstructor (free items) and constructor (initialize metavariables) of the set object.
ypnos
A: 

I think you can preallocate a certain number of elements for STL containers, thus having a constant memory allocation cost if you know how many elements will be in the container.

Paul Nathan
std::set has no such functionality. Perhaps you're thinking of std::vector::reserve?
Don Neufeld
@Don: yes, I was.
Paul Nathan