tags:

views:

455

answers:

5
+1  Q: 

vector reserve c++

I have a very large multidimensional vector that changes in size all the time. Is there any point to use the vector.reserve() function when I only know a good approximation of the sizes.

So basically I have a vector

A[256*256][x][y]

where x goes from 0 to 50 for every iteration in the program and then back to 0 again. The y values can differ every time, which means that for each of the [256*256][y] elements the vector y can be of a different size but still smaller than 256;

So to clarify my problem this is what I have:

vector<vector<vector<int>>> A;
for(int i =0;i<256*256;i++){
  A.push_back(vector<vector<int>>());
  A[i].push_back(vector<int>());
  A[i][0].push_back(SOME_VALUE);
}

Add elements to the vector...

A.clear();

And after this I do the same thing again from the top.

When and how should I reserve space for the vectors. If I have understood this correctly I would save a lot of time if I would use reserve as I change the sizes all the time?

What would be the negative/positive sides of reserving the maximum size my vector can have which would be [256*256][50][256] in some cases.

BTW. I am aware of different Matrix Templates and Boost, but have decided to go with vectors on this one...

EDIT: I was also wondering how to use the reserve function in multidimensional arrays. If I only reserve the vector in two dimensions will it then copy the whole thing if I exceed its capacity in the third dimension?

A: 

You have a working implementation but are concerned about the performance. If your profiling shows it to be a bottleneck, you can consider using a naked C-style array of integers rather than the vector of vectors of vectors.

See how-do-i-work-with-dynamic-multi-dimensional-arrays-in-c for an example

You can re-use the same allocation each time, reallocing as necessary and eventually keeping it at the high-tide mark of usage.

If indeed the vectors are the bottleneck, performance beyond avoiding the sizing operations on the vectors each loop iteration will likely become dominated by your access pattern into the array. Try to access the highest orders sequentially.

Will
Don't do this. vectors handle the reallocation for you. use of realloc() in a C++ program is almost sure to be the wrong thing to do.
anon
In this particular example, the vector of vectors of vectors of POD, how can malloc/realloc/free be inappropriate?
Will
At least since I have different sizes in the "third" dimension I can easily access the size if I use Vectors...
mrbuxley
+2  A: 

Whenever you push a vector into another vector, set the size in the pushed vectors constructor:

 A.push_back(vector<vector<int>>( somesize ));
anon
Thanks! I didn't know one could do that...
mrbuxley
">>" in templates is not always handled properly. Insert the space to make it portable. Just nagging.
overrider
+1  A: 

If you know the size of a vector at construction time, pass the size to the c'tor and assign using operator[] instead of push_back. If you're not totally sure about the final size, make a guess (maybe add a little bit more) and use reserve to have the vector reserve enough memory upfront.

What would be the negative/positive sides of reserving the maximum size my vector can have which would be [256*256][50][256] in some cases.

Negative side: potential waste of memory. Positive side: less CPU time, less heap fragmentation. It's a memory/cpu tradeoff, the optimum choice depends on your application. If you're not memory-bound (on most consumer machines there's more than enough RAM), consider reserving upfront.

To decide how much memory to reserve, look at the average memory consumption, not at the peak (reserving 256*256*50*256 is not a good idea unless such dimensions are needed regularly)

Alexander Gessler
+2  A: 

To help with discussion you can consider the following typedefs:

typedef std::vector<int> int_t;   // internal vector
typedef std::vector<int_t> mid_t; // intermediate
typedef std::vector<mid_t> ext_t; // external

The cost of growing (vector capacity increase) int_t will only affect the contents of this particular vector and will not affect any other element. The cost of growing mid_t requires copying of all the stored elements in that vector, that is it will require all of the int_t vector, which is quite more costly. The cost of growing ext_t is huge: it will require copying all the elements already stored in the container.

Now, to increase performance, it would be much more important to get the correct ext_t size (it seems fixed 256*256 in your question). Then get the intermediate mid_t size correct so that expensive reallocations are rare.

The amount of memory you are talking about is huge, so you might want to consider less standard ways to solve your problem. The first thing that comes to mind is adding and extra level of indirection. If instead of holding the actual vectors you hold smart pointers into the vectors you can reduce the cost of growing the mid_t and ext_t vectors (if ext_t size is fixed, just use a vector of mid_t). Now, this will imply that code that uses your data structure will be more complex (or better add a wrapper that takes care of the indirections). Each int_t vector will be allocated once in memory and will never move in either mid_t or ext_t reallocations. The cost of reallocating mid_t is proportional to the number of allocated int_t vectors, not the actual number of inserted integers.

using std::tr1::shared_ptr; // or boost::shared_ptr
typedef std::vector<int> int_t;
typedef std::vector< shared_ptr<int_t> > mid_t;
typedef std::vector< shared_ptr<mid_t> > ext_t;

Another thing that you should take into account is that std::vector::clear() does not free the allocated internal space in the vector, only destroys the contained objects and sets the size to 0. That is, calling clear() will never release memory. The pattern for actually releasing the allocated memory in a vector is:

typedef std::vector<...> myvector_type;
myvector_type myvector;
...
myvector.swap( myvector_type() ); // swap with a default constructed vector
David Rodríguez - dribeas
Wow, thank you very much. What a Great Answer! +10000!
mrbuxley
@David - are you sure you got the `internal_t` and `intermediate_t` bits right in the first snippet?
Manuel
@Manuel: not :) I have corrected the typedefs
David Rodríguez - dribeas
A: 

I am having a problem similar to #0.

A solution with the shared pointers might be the way to go, but I would like to try a solution where the vector<vector<vector<int> > > A simply has .reserve()'ed enough space.

Will A.reserve(500) (assumming 500 is the size or, alternatively an upper bound on the size) be enough to hold "2D" vectors of large size, say [1000][10000]?

The reason for my question is mainly because I cannot see any way of reasonably estimating the size of the interior of A at the time of .reserve(500).

An example of my question:


vector<vector<vector<int> > > A;
A.reserve(500+1);
vector<vector<int> > temp2;
vector<int> temp1 (666,666);
for(int i=0;i<500;i++)
{
A.push_back(temp2);
for(int j=0; j< 10000;j++)
{
A.back().push_back(temp1);
}
}

Will this ensure that no reallocation is done for A?

If temp2.reserve(100000) and temp1.reserve(1000) where added at creation will this enure no reallocation at all will occur at all?

In the above please disregard the fact that memory could be wasted due to conservative .reserve() calls.

Thank you all in advance!

Tue Christensen