views:

644

answers:

9

Hi, I have a question about the std::vector.

I have a very memory intensive algorithm where I forsee that predicting vector sizes and reserving enough memory for the vectors in advance will help me a lot with reducing memory usage.

Which of the following is better:

for ( ... ) {
  std::vector<Type> my_vector;
  my_vector.reserve(stuff_count);
  // Do stuff , and append stuff to my_vector.
}

Or this:

std::vector my_vector;
for ( ... ) {
  my_vector.clear();
  my_vector.reserve(stuff_count);
  // Do stuff , and append stuff to my_vector.
}

Please tell me which is best, or if there is an even better way of doing stuff.

Thank you very much in advance!

+5  A: 

The second one could be slightly faster, but I find the first one cleaner.

Maurice Perry
Clean is ok, but in my situation, clean does not matter as my algorithm needs as much optimization as possible. My unoptimized tests could use 10GB memory no problem.
Nailer
Then you should not be asking questions like this (everybodies opinion may very/be wrong). What you should do is profile the code and time it on a smaller data set.
Martin York
+10  A: 

With the first variant you reallocate the vector's buffer on each iteration – that's usually quite costly. With the second variant you only reallocate occasionally. The second variant is better since speed is a priority for you.

It's unclear from you question where the number of elements is know from. Maybe you even can quickly calculate the maximum number of elements for all iterations, set this to be the buffer size and have no reallocation.

sharptooth
that's usually quite costly: That's a bold statement without profiling. The C++ memory allocator is designed for exactly this type of situation so I doubt it (but even that assertion is bold and needs to be tested).
Martin York
Well, we once had to profile in a similar situation. Such reallocations in extreme cases were responsible for around 95% of time - a buffer was allocated, data was copied, briefly analyzed and then the buffer was discarded. That was rather surprising.
sharptooth
This leads to conclusion that if one wants fast execution he should avoid unnecessary reallocations.
sharptooth
@sharptooth - why wouldn't you pass that data by reference if all you're going to do is analyze the data held in the buffer being passed?
MacX.dmg
@ MacX.dmg - That's exactly what we did to fix the problem. But before profiling noone expected that innocent copying of a small array could introduce such a penalty.
sharptooth
+5  A: 

As the differences in the code are trivial, why not test both approaches and see which works best for your particular application?

anon
+1  A: 

It depends a little on how Type needs to be constructed/destructed, I would say. If it is a POD that does not require destruction, you can skip the clear(), which calls all the destructors in a loop, and use it as a static array instead:

std::vector<Type> my_vector(size);
for (...)
{
  int index = 0;
  // Do stuff
  my_vector[index] = some_value;
}

(Caveat: code untested)

Carl Seleborg
If it's a POD, I'd expect the vector to optimize the destruction away in any case. Test that before jumping to conclusions about performance
jalf
Agreed. (with Jalf)
Martin York
In theory, you're right, jalf. Take my answer as a hint that you might be destructing things a lot, and that there's probably some work that can be done there.
Carl Seleborg
+1  A: 

...reserving enough memory for the vectors in advance will help me a lot with reducing memory usage

err... what?! That makes no sense at all. Reserving memory doesn't help with reducing memory usage in any way. It prevents the need for constant reallocation which makes things faster, but as far as usage goes you get no benefit.

OJ
Indeed the number of free bytes will be equal, but the largest allocatable chunk will be smaller: reallocating over and over again until you got the right size will eventually fragment your heap. In the end.
xtofl
On top of that, vectors must have continuous memory, so if the current vector can't be extended anymore, a whole new block needs to be allocated, the old vector copied and deallocated. Temporarily using (oldsize + newsize) memory. If you have huge memory requirements, this becomes relevant.
Pieter
Vectors allocate the double of the memory they need when you call the append function and there isn't enough room for the next element. This means that if I have a vector just big enough for 100 elements, when I add the 101st element the vector will allocate memory enough for 200 elements.
Nailer
True, good point. I hadn't thought of the fragmentation issue. I can't see how ANY of the above implementations will help with that anyway. I don't see custom allocation or mem handling, so the issue would STILL arise. My point remains valid ;)
OJ
+5  A: 

I forsee that predicting vector sizes and reserving enough memory for the vectors in advance will help me a lot with reducing memory usage.

Try and act like an engineer not a fortune teller. Create a test, and measure the difference.

Pete Kirkham
A: 

If you need to do a lot of appends use std::deque instead of std::vector and just use "push_back".

Edouard A.
A: 

how about?

std::vector<DataType> my_vector;
my_vector.resize(sizeof(yourData));
memcpy(reinterpret_cast<char*>(&my_vector), &yourData, sizeof(yourData));
PiNoYBoY82
Why not std::copy()?
Patrick Daryll Glandien
+1  A: 

The second one will use the maximum memory of all uses of it through the loop, i.e., the maximum size of types of stuff_count. std::vector::clear() does not necessarily free memory. I.e., if you call std::vector::capacity() before and after std::vector::clear() a standard conforming implementation could return the same value.

At best, you'll reduce the number of times you allocate memory with the above scheme. But you certainly won't be reducing the memory footprint at any point. If you want to shrink to the reserved amount of memory, you should use the vector swap idiom:

std::vector<type>().swap(my_vector);
my_vector.reserve(stuff_count);

Or your first solution, since the overall effect will be the same.

MSN