The only way in which std::stack
is significantly slower than the processor stack is that it has to allocate memory from the free store. By default, it uses std::deque
for storage, which allocates memory in chunks as needed. As long as you don't keep destroying and recreating the stack, it will keep that memory and not need to allocate more unless it grows bigger than before. So structure code like this:
std::stack<int> stack;
for (int i = 0; i < HUGE_NUMBER; ++i)
do_lots_of_work(stack); // uses stack
rather than:
for (int i = 0; i < HUGE_NUMBER; ++i)
do_lots_of_work(); // creates its own stack
If, after profiling, you find that it's still spending too long allocating memory, then you could preallocate a large block so you only need a single allocation when your program starts up (assuming you can find an upper limit for the stack size). You need to get into the innards of the stack to do this, but it is possible by deriving your own stack type. Something like this (not tested):
class PreallocatedStack : public std::stack< int, std::vector<int> >
{
public:
explicit PreallocatedStack(size_t size) { c.reserve(size); }
};
EDIT: this is quite a gruesome hack, but it is supported by the C++ Standard. More tasteful would be to initialise a stack with a reserved vector, at the cost of an extra allocation. And don't try to use this class polymorphically - STL containers aren't designed for that.
Using the processor stack won't be portable, and on some platforms might make it impossible to use local variables after pushing something - you might end up having to code everything in assembly. (That is an option, if you really need to count every last cycle and don't need portability, but make sure you use a profiler to check that it really is worthwhile). There's no way to use another thread's stack that will be faster than a stack container.