views:

528

answers:

3

I am writing code that inserts rows from a database into a vector. The vectors are then stored in a std::map. This architecture allows me to logically partition the datasets (vectors), based on the map key.

In my code, I will be retrieving a dataset (i.e. vector) from the std::map, adding/removing rows to it or performing some other logic on it, and then sticking the vector back into the map (all this goes on in a while() loop).

I have two questions, both of which are related to the (potentially) large number of elements stored in a vector. The vector may hold anything from a few tens to several tens of thousands of elements. There is no way of me to know beforehand, how many records will be retrieved from the database. Therefore, the memory management scheme of std::vector (i.e. alloc/dealloc) becomes of great importance in order to use memory effeciently, and to avoid unecessary (memory) fragmentation:

My questions are:

  1. Given the potentially large number of elements that a row can store, ideally I would like to alloc/dealloc from a memory pool. But I dont know how to use std::vector with a memory pool, and I dont know if that is going to be unnecessarily complicated. If that is overkill (or too complicated), then my other idea is to preallocate a fixed slab of memory when the vector is first created. But this is also likely to be too simplistic, as the number of elemnts are likely to vary widely from one vector instance to another - leading to (memory) fragmentation etc, not to mention inefficient use of memory.

    What is the recommended approach here?

  2. Given the fact that std::map (all STL containers IIRC), stores a COPY of the value element, the prospect of making copies of vectors containing several tens of thousands of elements, is just plain wrong. I am thinking therefore, of writing a SmartPointerMap wrapper arround stl::map and then store pointers to the vectors instead of the actual vectors.

    Am I on the right track?. If No, what is a better solution.?. If yes, is there a boost library I can use (instead of writing my SmartPointerMap class template)?

[Edit]

I am building on Linux (Ubuntu 9.10) with gcc 4.4.1

+1  A: 

To answer question 2:

using namespace std;

map<string, vector<int>* > foo;
vector<int>* pointer= foo["thekey"];

If using smart (reference-counted) pointers is a requirement:

#include<tr1/shared_ptr.h>
using namespace std::tr1;
using namespace std;

map<string, shared_ptr<vector<int> > > foo;
shared_ptr<vector<int> > pointer= foo["thekey"];

To answer question #1, you can write a new allocator template class and declare your vectors to use that allocator, but I don't really know anything about writing allocators:

map<string, vector<int, myallocator<int> >* > foo;

In particular I don't know how to design an allocator that will avoid fragmenting your memory pool. (But if you have that part answered, then writing a custom allocator would be the way to go.)

Ken Bloom
+4  A: 

Assuming that you're using vector for the map's data_type and not the key_type, you could modify the data in place without copying it. std::map::operator[]() returns a reference to a non-const data_type, and the iterator returned from the non-const version of std::map::find() allows you to modify the data.

What if you need to change the key when you change the data? You can use std::swap() to move the data from one vector to another without copying.

Don't forget that vector does not reduce its capacity() when you erase elements. Also, vector will usually allocate more capacity() than you need, so that push_back() takes amortized constant time. For very large vectors, these behaviors may significantly increase your program's memory usage, if you're not careful.

If you're using vector for the map's key_type and the map has extremely large keys, then pointers or smart pointers might help. However, if this is the case, you must make sure not to modify the contents of a key that is pointed to by one of the map values, because std::map is not designed to handle that.

As for the custom allocator idea, get your code working with the standard allocator first, and then see if it's fast enough. It might be fine using the standard allocator. If your code is not fast enough with the standard allocator, profile to see where the time is really being spent (it may be somewhere else, like the database code). If you write a custom allocator and you never compare it against the standard allocator, you will not know whether your custom allocator is actually an improvement; it could be much slower than the standard allocator.

bk1e
See this question about reducing the capacity of a vector: http://stackoverflow.com/questions/1111078/reduce-the-capacity-of-an-stl-vector
Emile Cormier
+2  A: 

Wrt #1, the default heap implementation in GCC/Linux (ptmalloc) will use a free list (aka memory pool) for small objects (<=64 bytes by default last time I checked). If you still want to use a custom allocator, the Boost.Pool library has allocators that satisfy the Standard Allocator requirements. As bk1e suggested, benchmark before and after to see if it's even worth it.

When populating your vectors from the database, if possible/practical, try to use vector<T>::reserve() to make vector allocate enough memory from the start and avoid reallocations.

Hope this helps.

Emile Cormier