



Before I write my own I will ask all y'all.

I'm looking for a C++ class that is almost exactly like a STL vector but stores data into an array on the stack. Some kind of STL allocator class would work also, but I am trying to avoid any kind of heap, even static allocated per-thread heaps (although one of those is my second choice). The stack is just more efficient.

It needs to be almost a drop in replacement for current code that uses a vector.

For what I was about to write myself I was thinking of something like this:

char buffer[4096];
stack_vector<match_item> matches(buffer, sizeof(buffer));

Or the class could have buffer space allocated internally. Then it would look like:

stack_vector<match_item, 256> matches;

I was thinking it would throw std::bad_alloc if it runs out of space, although that should not ever happen.


Using Chromium's stack_container.h works great!

The reason I hadn't thought of doing it this way myself is that I have always overlooked the allocator object parameter to the STL collection constructors. I have used the template parameter a few times to do static pools but I'd never seen code or written any that actually used the object parameter. I learned something new. Very cool!

The code is a bit messy and for some reason GCC forced me to declare the allocator as an actual item instead of constructing it into vector's allocator parameter. It went from something like this:

typedef std::pair< const char *, const char * > comp_list_item;
typedef std::vector< comp_list_item > comp_list_type;

comp_list_type match_list;

To this:

static const size_t comp_list_alloc_size = 128;
typedef std::pair< const char *, const char * > comp_list_item;
typedef StackAllocator< comp_list_item, comp_list_alloc_size > comp_list_alloc_type;
typedef std::vector< comp_list_item, comp_list_alloc_type > comp_list_type;

comp_list_alloc_type::Source match_list_buffer;
comp_list_alloc_type match_list_alloc( &match_list_buffer );
comp_list_type match_list( match_list_alloc );
match_list.reserve( comp_list_alloc_size );

And I have to repeat that whenever I declare a new one. But it works just like I wanted.

I noticed that stack_container.h has a StackVector defined and I tried using it. But it doesn't inherit from vector or define the same methods so it wasn't a drop-in replacement. I didn't want to rewrite all the code using the vector so I gave up on it.

+1  A: 

You can use your own allocator for std::vector and have it allocate chunks of your stack-based storage, similar to your example. The allocator class is the second part of the template.

Edit: I've never tried this, and looking at the documentation further leads me to believe you can't write your own allocator. I'm still looking into it.

Mark Ransom
Bumped you because you were close to right. I didn't realize it either, but you *can* make an allocator do this.
Zan Lynx

I've wanted something like this, but never enough to write one.

I would just take the std::vector you have now from your header files, remove the dynamic memory managment, add the constructor you suggest, and do whatever other changes are required to get it to compile.

You'll want to make the copy and operator= constructors private so that no one tries to return it from a function or do anything else that will cause memory corruption.

David Norman
+1  A: 

Why do you want to put it on the stack particularly? If you have an implementation of alloca(), you could buld a class allocator using that instead of malloc(), but your idea of using a statically allocated array is even better: it's just as fast on most architectures, and you don't risk stack corruption of you mess up.

Charlie Martin
On the stack because the program is multi-threaded and multi-architecture and I don't want to mess with all the different ways to get TLS storage. Not on the heap because of thread contention on the heap lock.
Zan Lynx
Allocate per-thread block on the heap and use it the same way as stack-allocated block?
Requires TLS, to store the pointer to "this thread's" allocated block.
Steve Jessop
+13  A: 

You don't have to write a completely new container class. You can stick with your STL containers, but change the second parameter of for example std::vector to give it your custom allocator which allocates from a stack-buffer. The chromium authors wrote an allocator just for this:

It works by allocating a buffer where you say how big it is. You create the container and call container.reserve(buffer_size);. If you overflow that size, the allocator will automatically get elements from the heap (since it is derived from std::allocator, it will in that case just use the facilities of the standard allocator). I haven't tried it, but it looks like it's from google so i think it's worth a try.

Usage is like this:

StackVector<int, 128> s;
s->push_back(42); // overloaded operator->

// to get the real std::vector. 
StackVector<int, 128>::ContainerType & v = s.container();
std::cout << v[0] << " " << v[1] << std::endl;
Johannes Schaub - litb
I think you mean "container.reserve(buffer_size)". Reversing a container is subtly different, and not a member function ;-)
Steve Jessop
oops, indeed. thanks :)
Johannes Schaub - litb
Slight worry in that stack_container.h: "IMPORTANT: Take care to ensure that stack_buffer_ is aligned since it is used to mimic an array of T. Be careful while declaring any unaligned types (like bool) before stack_buffer_." Eek! Order of automatics on the stack is not defined IIRC.
Steve Jessop
yeah that sounds naughty. better we create an union around it with the largest possible primitive type. only idea i get now: union { unsigned long _a; double long _b; char stack_buffer_[sizeof(T[stack_capacity])]; }; and hope they will make the alignment good enough
Johannes Schaub - litb
otherwise, we still have __atttribute__ weapons and whatever visual c++ provides us :D
Johannes Schaub - litb
Wonder why the authors didn't do it - maybe there's something that can't be done portably. uint64_t? I think one of the ARM ABIs 8-aligns 64bit integers, so perhaps the caller has to do something non-portable on such platforms anyway.
Steve Jessop
Assuming N is a multiple of sizeof(uint32_t), wouldn't union { uint32_t _x[N / sizeof(uint32_t)]; uint8_t buffer[N]; }; gaurantee that buffer is aligned on a 32-bit boundary due to the fact that _x has to be?
Evan Teran
Sure, for the same reason as litb's union with long int. But it's platform-specific whether 4 bytes is enough alignment for every type.
Steve Jessop
fair enough, I suppose if you only type pun buffer to types of uint32_t* or smaller, then it should always be well defined behavior then?
Evan Teran
i've recently read about boost::type_with_alignment<N>::type, which could solve your exact problem.
Johannes Schaub - litb
+1  A: 

tr1::array partially matches your description. It lacks things like push___back(), etc., but it might be worth taking a look at as a starting point. Wrapping it and adding an index to the "back" to support push_back(), etc. should be fairly easy.

+4  A: 

Some options you may want to look at:

STLSoft by Matthew Wilson (author of Imperfect C++) has an auto_buffer template class that puts a default array on the stack but if it grows larger than the stack allocation will grab the memory from the heap. I like this class - if you know that your container sizes are generally going to be bounded by a rather low limit, then you get the speed of a local, stack allocated array. However, for the corner cases where you need more memory, it all still works properly.

Note that the implementation I use myself is not STLSoft's, but an implementation that borrows heavily from it.

"The Lazy Programmer" did a post for an implementation of a container that uses alloca() for the storage. I'm not a fan of this technique, but I'll let you decide for yourself if it's what you want:

Then there's boost::array which has none of the dynamic sizing behavior of the first two, but gives you more of the vector interface than just using pointers as iterators that you get with built-in arrays (ie., you get begin(), end(), size(), etc.):

Michael Burr
alloca is awesome, but on some platforms you may not be able to detect allocation failure until you receive SIGSEGV. I believe this is the case on Linux.
+2  A: 

If speed matters, I see run times

  • 4 ns int[10], fixed size on the stack
  • 40 ns <vector>
  • 1300 ns <stlsoft/containers/pod_vector.hpp>

for one stupid test below -- just 2 push, v[0] v[1], 2 pop, on one platform, mac ppc, gcc-4.2 -O3 only. (I have no idea if Apple have optimized their stl.)

Don't accept any timings you haven't faked yourself. And of course every usage pattern is different. Nonetheless factors > 2 surprise me.

(If mems, memory accesses, are the dominant factor in runtimes, what are all the extra mems in the various implementations ?)

#include <stlsoft/containers/pod_vector.hpp>
#include <stdio.h>
using namespace std;

int main( int argc, char* argv[] )
        // times for 2 push, v[0] v[1], 2 pop, mac g4 ppc gcc-4.2 -O3 --
    // Vecint10 v;  // stack int[10]: 4 ns
    vector<int> v;  // 40 ns
    // stlsoft::pod_vector<int> v;  // 1300 ns
    // stlsoft::pod_vector<int, std::allocator<int>, 64> v;

    int n = (argv[1] ? atoi( argv[1] ) : 10) * 1000000;
    int sum = 0;

    while( --n >= 0 ){
        v.push_back( n );
        v.push_back( n );
        sum += v[0] + v[1];
    printf( "sum: %d\n", sum );
