views:

187

answers:

6

I have an STL vector My_Partition_Vector of Partition objects, defined as

struct Partition // the event log data structure
{
    int key;
    std::vector<std::vector<char> > partitions;
    float modularity;
};

The actual nested structure of Partition.partitions varies from object to object but in the total number of chars stored in Partition.partitions is always 16.

I assumed therefore that the total size of the object should be more or less 24 bytes (16 + 4 + 4). However for every 100,000 items I add to My_Partition_Vector, memory consumption (found using ps -aux) increases by around 20 MB indicating around 209 bytes for each Partition Object.

This is a nearly 9 Fold increase!? Where is all this extra memory usage coming from? Some kind of padding in the STL vector, or the struct? How can I resolve this (and stop it reaching into swap)?

A: 

16 bytes is a complete and total waste. You're storing a hell of a lot of data about very small objects. A vector of vector is the wrong solution to use. You should log sizeof(vector) - it's not insignificant, as it performs a substantial function. On my compiler, sizeof(vector) is 20. So each Partition is 4 + 4 + 16 + 20 + 20*number of inner partitions + memory overheads like the vectors not being the perfect size.

You're only storing 16 bytes of data, and wasting ridiculous amounts of memory allocating them in the most segregated, highest overhead way you could possibly think of. The vector doesn't use a lot of memory - you have a terrible design.

DeadMG
Thanks for such a constructive answer
zenna
-1 for utter lack of civility.
SCFrench
"memory overheads like the vectors not being the perfect size" - to expand on this - each vector will most likely make a memory allocation to store some bytes in (well, empty ones might not). But the memory allocator has its own overhead to keep track of allocations (maybe a couple of pointers or so per allocation), and due to alignment requirements a memory allocator will in practice pretty much always give you a multiple of 8 bytes even if you ("you" being the vector) only ask for 1. I say "most likely" and "pretty much always" because it's implementation-dependent, but typical.
Steve Jessop
To be more constructive - assuming you can't have more than 16 partitions in a Partition, it would be leaner just to have Partition contain an array of 16 bytes for the data, plus an array of 16 bytes each of which contains the offset of the start of a partition. Or 16 `char*` with the same purpose if you're feeling generous, or 8 bytes with two offsets packed into each if you're feeling mean.
Steve Jessop
@DeadMG: "you're only storing 16 bytes of data" - not really true, since presumably the partition structure is significant, and hence is also "data".
Steve Jessop
The OP only seemed to consider his 16 bytes significant, since that's all he totalled up when he came up with his own sizeof(Partition).
DeadMG
In the end I stored the partitions as a 128 bit bitset in the format [Partition_Number Node_number ...], using 4 bits for each integer. This brings the total size of the object down to 24 bytes which is what I hoped for. I could have saved a few extra bytes by only storing the partition number once but I thought that would just over complicate things. I am finding all connected partitions on a graph for the interested
zenna
+1  A: 

If you store a near constant amount of objects, then I suggest to use a 2-dimensional array.

The most likely reason for the memory consumption is debug data. STL implementations usually store A LOT of debug data. Never profile an application with debug flags on.

Ramon Zarazua
I don't believe debug data is the fundamental cause of the memory consumption issue described by the OP. Based on its capabilities, a std::vector just requires a lot of overhead (relative to the small allocations in question).
nobar
+3  A: 

For one thing std::vector models a dynamic array so if you know that you'll always have 16 chars in partitions using std::vector is overkill. Use a good old C style array/matrix, boost::array or boost::multi_array.

To reduce the number of re-allocations needed for inserting/adding elements due to it's memory layout constrains std::vector is allowed to preallocate memory for a certain number of elements upfront (and it's capacity() member function will tell you how much).

Eugen Constantin Dinca
boost::multi_array seems to do what is needed here -- good suggestion!
nobar
On further review, despite claims of efficiency, boost::multi_array doesn't seem to give any guarantees about memory use. Experimentally, I found it to be just as wasteful as nested vectors -- using about 10x more memory than needed for a 2D array of 16 total bytes. That was with "-NDEBUG -O3", so it should have been well optimized.
nobar
Eugen Constantin Dinca
@Eugen: I added my code as a new answer.
nobar
+1  A: 

On my system, sizeof(vector) is 24. This probably corresponds to 3 8-byte members: capacity, size, and pointer. Additionally, you need to consider the actual allocations which would be between 1 and 16 bytes (plus allocation overhead) for the inner vector and between 24 and 384 bytes for the outer vector ( sizeof(vector) * partitions.capacity() ).

A wrote a program to sum this up...

   for ( int Y=1; Y<=16; Y++ )
      {

      const int X = 16/Y;
      if ( X*Y != 16 ) continue; // ignore imperfect geometries

      Partition a;
      a.partitions = vector< vector<char> >( Y, vector<char>(X) );

      int sum = sizeof(a); // main structure
      sum += sizeof(vector<char>) * a.partitions.capacity(); // outer vector
      for ( int i=0; i<(int)a.partitions.size(); i++ )
         sum += sizeof(char) * a.partitions[i].capacity(); // inner vector

      cerr <<"X="<<X<<", Y="<<Y<<", size = "<<sum<<"\n";

      }

The results show how much memory (not including allocation overhead) is need for each simple geometry...

X=16, Y=1, size = 80
X=8, Y=2, size = 104
X=4, Y=4, size = 152
X=2, Y=8, size = 248
X=1, Y=16, size = 440

Look at the how the "sum" is calculated to see what all of the components are.

The results posted are based on my 64-bit architecture. If you have a 32-bit architecture the sizes would be almost half as much -- but still a lot more than what you had expected.

In conclusion, std::vector<> is not very space efficient for doing a whole bunch of very small allocations. If your application is required to be efficient, then you should use a different container.


My approach to solving this would probably be to allocate the 16 chars with

std::tr1::array<char,16>

and wrap that with a custom class that maps 2D coordinates onto the array allocation.

Below is a very crude way of doing this, just as an example to get you started. You would have to change this to meet your specific needs -- especially the ability to specify the geometry dynamically.

   template< typename T, int YSIZE, int XSIZE >
   class array_2D
      {
      std::tr1::array<char,YSIZE*XSIZE> data;
   public:
      T & operator () ( int y, int x ) { return data[y*XSIZE+x]; } // preferred accessor (avoid pointers)
      T * operator [] ( int index ) { return &data[index*XSIZE]; } // alternative accessor (mimics boost::multi_array syntax)
      };
nobar
+2  A: 

While I think he may be overstating the situation just a tad, I'm in general agreement with DeadMG's conclusion that what you're doing is asking for trouble.

Although I'm generally the one looking at (whatever mess somebody has made) and saying "don't do that, just use a vector", this case might well be an exception. You're creating a huge number of objects that should be tiny. Unfortunately, a vector typically looks something like this:

template <class T>
class vector { 
    T *data;
    size_t allocated;
    size_t valid;
public:
    // ...
};

On a typical 32-bit machine, that's twelve bytes already. Since you're using a vector<vector<char> >, you're going to have 12 bytes for the outer vector, plus twelve more for each vector it holds. Then, when you actually store any data in your vectors, each of those needs to allocate a block of memory from the free store. Depending on how your free store is implemented, you'll typically have a minimum block size -- frequently 32 or even 64 bytes. Worse, the heap typically has some overhead of its own, so it'll add some more memory onto each block, for its own book-keeping (e.g., it might use a linked list of blocks, adding another pointer worth of data to each allocation).

Just for grins, let's assume you average four vectors of four bytes apiece, and that your heap manager has a 32-byte minimum block size and one extra pointer (or int) for its bookkeeping (giving a real minimum of 36 bytes per block). Multiplying that out, I get 204 bytes apiece -- close enough to your 209 to believe that's reasonably close to what you're dealing with.

The question at that point is how to deal with the problem. One possibility is to try to work behind the scenes. All the containers in the standard library use allocators to get their memory. While they default allocator gets memory directly from the free store, you can substitute a different one if you choose. If you do some looking around, you can find any number of alternative allocators, many/most of which are to help with exactly the situation you're in -- reducing wasted memory when allocating lots of small objects. A couple to look at would be the Boost Pool Allocator and the Loki small object allocator.

Another possibility (that can be combined with the first) would be to quit using a vector<vector<char> > at all, and replace it with something like:

char partitions[16];
struct parts { 
    int part0 : 4;
    int part1 : 4;
    int part2 : 4;
    int part3 : 4;
    int part4 : 4;
    int part5 : 4;
    int part6 : 4
    int part7 : 4;
};

For the moment, I'm assuming a maximum of 8 partitions -- if it could be 16, you can add more to parts. This should probably reduce memory usage quite a bit more, but (as-is) will affect your other code. You could also wrap this up into a small class of its own that provides 2D-style addressing to minimize impact on the rest of your code.

Jerry Coffin
A: 

...This is a bit of a side conversation, but boost::multi_array was suggested as an alternative to the OP's use of nested vectors. My finding was that multi_array was using a similar amount of memory when applied to the OP's operating parameters.

I derived this code from the example at Boost.MultiArray. On my machine, this showed multi_array using about 10x more memory than ideally required assuming that the 16 bytes are arranged in a simple rectangular geometry.

To evaluate the memory usage, I checked the system monitor while the program was running and I compiled with

( export CXXFLAGS="-Wall -DNDEBUG -O3" ; make main && ./main )

Here's the code...

   #include <iostream>
   #include <vector>
   #include "boost/multi_array.hpp"
   #include <tr1/array>
   #include <cassert>

   #define USE_CUSTOM_ARRAY 0 // compare memory usage of my custom array vs. boost::multi_array

   using std::cerr;
   using std::vector;

  #ifdef USE_CUSTOM_ARRAY
   template< typename T, int YSIZE, int XSIZE >
   class array_2D
      {
      std::tr1::array<char,YSIZE*XSIZE> data;
   public:
      T & operator () ( int y, int x ) { return data[y*XSIZE+x]; } // preferred accessor (avoid pointers)
      T * operator [] ( int index ) { return &data[index*XSIZE]; } // alternative accessor (mimics boost::multi_array syntax)
      };
  #endif

int main ()
   {

   int COUNT = 1024*1024;

  #if USE_CUSTOM_ARRAY
   vector< array_2D<char,4,4> > A( COUNT );
   typedef int index;
  #else
   typedef boost::multi_array<char,2> array_type;
   typedef array_type::index index;
   vector<array_type> A( COUNT, array_type(boost::extents[4][4]) );
  #endif

  // Assign values to the elements
  int values = 0;
  for ( int n=0; n<COUNT; n++ )
     for(index i = 0; i != 4; ++i) 
       for(index j = 0; j != 4; ++j)
           A[n][i][j] = values++;

// Verify values
   int verify = 0;
    for ( int n=0; n<COUNT; n++ )
       for(index i = 0; i != 4; ++i) 
          for(index j = 0; j != 4; ++j)
             {
             assert( A[n][i][j] == (char)((verify++)&0xFF) );
            #if USE_CUSTOM_ARRAY
             assert( A[n][i][j] == A[n](i,j) ); // testing accessors
            #endif
             }

   cerr <<"spinning...\n";
   while ( 1 ) {} // wait here (so you can check memory usage in the system monitor)

   return 0;
   }
nobar
In my tests I got (with VC++2k8Express, Release mode, Full optimization - I'll try g++ 3.4.4 under cygwin): C_STYLE_ARRAY : 17M; BOOST_ARRAY : 17M; BOOST_MULTI_ARRAY : 108M; VECTOR : 444M.So `boost::multi_array` uses ~6 times more memory than a simple `C style` or `boost::array` but it uses far less than `std::vector<std::vector<>>` just like they claim: "Boost MultiArray is a more efficient and convenient way to express N-dimensional arrays than existing alternatives (especially the std::vector<std::vector<...>> formulation of N-dimensional arrays)."
Eugen Constantin Dinca
With g++ 4.3.4 20090804 under cygwin, -Wall -DNDEBUG -O3, boost 1.43 (I had 1.38 for the VC++2k8 Express test) I got: C_STYLE_ARRAY : 18M, BOOST_ARRAY : 18M, BOOST_MULTI_ARRAY : 92M, VECTOR : 137M
Eugen Constantin Dinca
@Eugen: Your results are consistent with mine. Mine was something like: array: 16M, multi_array: 160M; nested vector: 264M. So, for me, multi_array was more efficient than nested vector, but not significantly so with respect to the best-case size that the OP was looking for.
nobar