I am trying to figure out how C and C++ store large objects on the stack. Usually, the stack is the size of an integer, so I don't understand how larger objects are stored there. Do they simply take up multiple stack "slots"?
By "stack is size of an integer", you mean "the stack pointer is the size of an integer". It points to the top of the stack, which is a huge area of memory. Well, larger than an integer.
You can have objects that are massive enough (or numerous enough) that it won't make sense to put them on the stack. In that case, you can put the object on the heap and put a pointer to it on the stack. This is a difference between pass by value and pass by reference.
How do you define a large object? are we talking greater or less than the size of the stack space allocated?
for example if you have something like this:
void main() {
int reallyreallybigobjectonthestack[1000000000];
}
depending on your system you'll likely get a segfault because there simply isn't enough room to store the object. Otherwise it's stored like any other object. If your talking in actual physical memory then you don't have to worry about this because virtual memory at the operating system level will take care of handling that.
Also the size of the stack is not likely the size of an integer it depends entirely on your operating system and the layout of the applications Virtual Address Space.
In C and C++ you shouldn't store large objects on the stack, because the stack is limited (as you guessed). The stack for each thread is usually only a couple of megabytes or less (it can be specified when creating a thread). When you call "new" to create an object, it isn't put on the stack - it is put on the heap instead.
Stack size is limited. Usually the stack size is set when the process is created. Each thread in that process automatically gets the default stack size if not specified otherwise in the CreateThread() call. So, yes: there can be multiple stack 'slots', but each thread has only one. And they can't be shared between threads.
If you put objects that are larger than the remaining stack size onto the stack, you'll get a stack overflow and your application will crash.
So, if you have very large objects, allocate them on the heap, not on the stack. The heap is only limited by the amount of virtual memory (which is a magnitude bigger than the stack).
The stack is a piece of memory. The stack pointer points to the top. Values can be pushed on the stack and popped to retrieve them.
For example if we have a function which is called with two parameters (1 byte sized and the other 2 byte sized; just assume we have an 8-bit PC).
Both are pushed on the stack this moves the stack pointer up:
03: par2 byte2
02: par2 byte1
01: par1
Now the function is called and the return addres is put on the stack:
05: ret byte2
04: ret byte1
03: par2 byte2
02: par2 byte1
01: par1
OK, within the function we have 2 local variables; one of 2 bytes and one of 4. For these a position is reserved on the stack, but first we save the stack pointer so we know where the variables start by counting up and the parameters are found by counting down.
11: var2 byte4
10: var2 byte3
09: var2 byte2
08: var2 byte1
07: var1 byte2
06: var1 byte1
---------
05: ret byte2
04: ret byte1
03: par2 byte2
02: par2 byte1
01: par1
As you see, you can put anything on the stack as long as you have space left. And else you will get the phenomena that gives this site its name.
Whenever you enter a function, the stack grows to fit the local variables in that function. Given a largeObject
class that uses say 400 bytes:
void MyFunc(int p1, largeObject p2, largeObject *p3)
{
int s1;
largeObject s2;
largeObject *s3;
}
When you call this function, your stack will look something like this (details will vary based on calling convention and architecture):
[... rest of stack ...]
[4 bytes for p1]
[400 bytes for p2]
[4 bytes for p3]
[return address]
[old frame pointer]
[4 bytes for s1]
[400 bytes for s2]
[4 bytes for s3]
See x86 Calling Conventions for some information on how the stack operates. MSDN also has some nice diagrams for a few different calling convections, with Sample Code and resulting stack diagrams.
The stack is a large block of memory that stores local variables, information for returning from function calls, etc. The actual size of the stack varies significantly on the OS. For example, when creating a new thread on Windows, the default size is 1 MB.
If you try to create a stack object that needs more memory than is currently available on the stack, you get a stack overflow and bad things happen. A large class of exploit code purposely tries to create these or similar conditions.
The stack is not divided up into integer-sized chunks. It's just a flat array of bytes. It's indexed by an "integer" of type size_t (not int). If you create a large stack object that fits in the currently-available space, it just uses that space by bumping up (or down) the stack pointer.
As others have pointed out, it's best to use the heap for large objects, not the stack. This avoids stack overflow problems.
EDIT: If you're using a 64-bit application and your OS and runtime libraries are nice to you (see mrree's post), then it should be fine to allocate large temporary objects on the stack. If your application is 32-bit and/or your OS / runtime library is not nice, you'll probably need to allocate these objects on the heap.
Push
and pop
instructions usually aren't used for storing local stack frame variables. At the beginning of the function, the stack frame is set up by decrementing the stack pointer by the number of bytes (aligned to the word size) required by the function's local variables. This allocates the requisite amount of space "on the stack" for these values. All local variables are then accessed via a pointer to this stack frame (ebp
on x86).
As others have said, it's not clear just what you mean by "large objects"... However, since you then ask
Do they simply take up multiple stack "slots"?
I'm going to presume that you simply mean anything larger than an integer. As someone else noted, though, the stack doesn't have integer-sized "slots" -- it's just a section of memory, and every byte in it has its own address. The compiler tracks every variable by the address of the first byte of that variable -- this is the value that you get if you use the address-of operator ( &var
), and the value of a pointer is just this address for some other variable. The compiler also knows what type every variable is (you told it when you declared the variable), and it knows how large each type should be -- when you compile the program, it does whatever math is necessary to figure out how much space those variables will need when a function is called, and includes the result of that in the function entry-point code (the stack frame that PDaddy mentioned).
Stack and Heap are not as different as you think!
True, some operating systems do have stack limitations. (Some of those also have nasty heap limitations too!)
But this isn't 1985 any more.
These days, I run Linux!
My default stacksize is limited to 10 MB. My default heapsize is unlimited. It's pretty trivial to unlimit that stacksize. (*cough* [tcsh] unlimit stacksize *cough*. Or setrlimit().)
The biggest differences between stack and heap are:
- stack allocations just offset a pointer (and possibly allocate new memory-pages if the stack has grown large enough). Heap has to search through its data-structures to find a suitable memory block. (And possibly allocate new memory pages too.)
- stack goes out of scope when the current block ends. Heap goes out of scope when delete/free is called.
- Heap can get fragmented. Stack never gets fragmented.
Under Linux, both stack and heap are manged through virtual memory.
In terms of allocation time, even heap-searching through badly fragmented memory can't hold a candle to mapping in new pages of memory. Time-wise the differences are negligible!
Depending on your OS, oftentimes it's only when you actually use those new memory pages that they are mapped in. (NOT during the malloc() allocation!) (It's a lazy evaluation thing.)
(new would invoke the constructor, which would presumably use those memory pages...)
You can thrash the VM system by creating and destroying large objects on either the stack or the heap. It depends on your OS/compiler whether memory can/is reclaimed by the system. If it's not reclaimed, heap might be able to reuse it. (Assuming it hasn't been repurposed by another malloc() in the meantime.) Similarly, if stack is not reclaimed, it would just be reused.
Though pages that get swapped out would need to be swapped back in, and that's going to be your biggest time-hit.
Of all these things, I worry about memory fragmentation the most!
Lifespan (when it goes out of scope) is always the deciding factor.
But when you run programs for long periods of time, fragmentation creates a gradually increasing memory footprint. The constant swapping eventually kills me!
AMENDED TO ADD:
Man, I have been spoiled!
Something just wasn't adding up here... I figured either *I* was way the hell off base. Or everyone else was. Or, more likely, both. Or, just maybe, neither.
Whatever the answer, I had to know what was going on!
...This is going to be long. Bear with me...
I've spend most of the last 12 years working under Linux. And about 10 years before that under various flavors of Unix. My perspective on computers is somewhat biased. I have been spoiled!
I've done a little with Windows, but not enough to speak authoritatively. Nor, tragically, with Mac OS/Darwin either... Though Mac OS/Darwin/BSD is close enough that some of my knowledge carries over.
With 32-bit pointers, you run out of address space at 4 GB (2^32).
Practically speaking, STACK+*HEAP* combined is usually limited to somewhere between 2-4 GB as other things need to get mapped in there.
(There's shared memory, shared libraries, memory-mapped files, the executable image your running is always nice, etc.)
Under Linux/Unix/MacOS/Darwin/BSD, you can artificially constrain the HEAP or the STACK to whatever arbitrary values you want at runtime. But ultimately there is a hard system limit.
This is the distinction (in tcsh) of "limit" vs "limit -h". Or (in bash) of "ulimit -Sa" vs "ulimit -Ha". Or, programmatically, of *rlim_cur* vs *rlim_max* in struct rlimit.
Now we get to the fun part. With respect to Martin York's Code. (Thank you Martin! Good example. Always good to try things out!.)
Martin's presumably running on a Mac. (A fairly recent one. His compiler build is newer than mine!)
Sure, his code won't run on his Mac by default. But it will run just fine if he first invokes "unlimit stacksize" (tcsh) or "ulimit -Ss unlimited" (bash).
THE HEART OF THE MATTER:
Testing on an ancient (obsolete) Linux RH9 2.4.x kernel box, allocating large amounts of STACK OR HEAP, either one by itself tops out between 2 and 3 GB. (Sadly, the machine's RAM+SWAP tops out at a little under 3.5 GB. It's a 32-bit OS. And this is NOT the only process running. We make do with what we have...)
So there really are no limitations on STACK size vs HEAP size under Linux, other than the artificial ones...
BUT:
On a Mac, there's a hard stacksize limit of 65532 kilobytes. It has to do with how thing are laid out in memory.
Macs appear to stick their Shared system libraries in between at a fixed offset limiting both sides. You can still run Martin York's Code with "unlimit stacksize", since he's only allocating something like 8 MiB (< 64 MiB) of data. But he'll run out of STACK long before he runs out of HEAP.
I'm on Linux. I won't. Sorry kid. Here's a Nickel. Go get yourself a better OS.
In the long run, unless Apple does something really dumb, 64-bit address spaces will render this whole stack-limitation thing obsolete sometime Real Soon Now.
Moving on to Fragmentation:
Anytime you push something onto the STACK it's appended to the end. And it's removed (rolled back) whenever the current block exits.
As a result, there are no holes in the STACK. It's all one big solid block of used memory. With perhaps just a little unused space at the very end all ready for reuse.
In contrast, as HEAP is allocated and free'd, you wind up with unused-memory holes. These can gradually lead to an increased memory footprint over time. Not what we usually mean by a core leak, but the results are similar.
Memory Fragmentation is NOT a reason to avoid HEAP storage. It's just something to be aware of when you're coding.
Which brings up SWAP THRASHING:
- If you already have a large amount of heap allocated / in use.
- If you have lots of fragmented holes scattered about.
- And if you have a large number of small allocations.
Then you can wind up with a large number of variables, all utilized within a small localized region of the code, that are scattered across a great many virtual memory pages. (As in you are using 4 bytes on this 2k page, and 8 bytes on that 2k page, and so on for a whole lot of pages...)
All of which means that your program needs to have a large number of pages swapped in to run. Or it's going to swapping pages in and out constantly. (We call that thrashing.)
On the other hand, had these small allocations been made on the STACK, they would all be located in a contiguous stretch of memory. Fewer VM memory pages would need to be loaded. (4+8+... < 2k for the win.)
Sidenote: My reason for calling attention to this stems from a certain electrical engineer I knew who insisted that all arrays be allocated on the HEAP. We were doing matrix math for graphics. A *LOT* of 3 or 4 element arrays. Managing new/delete alone was a nightmare. Even abstracted away in classes it caused grief!
Next topic. Threading:
Yes, threads are limited to very tiny stacks by default.
You can change that with pthread_attr_setstacksize(). Though depending on your threading implementation, if multiple threads are sharing the same 32 bit address space, large individual per-thread stacks will be a problem! There just ain't that much room! Again, transitioning to 64-bit address spaces (OS's) will help.
pthread_t threadData;
pthread_attr_t threadAttributes;
pthread_attr_init( & threadAttributes );
ASSERT_IS( 0, pthread_attr_setdetachstate( & threadAttributes,
PTHREAD_CREATE_DETACHED ) );
ASSERT_IS( 0, pthread_attr_setstacksize ( & threadAttributes,
128 * 1024 * 1024 ) );
ASSERT_IS( 0, pthread_create ( & threadData,
& threadAttributes,
& runthread,
NULL ) );
With respect to Martin York's Stack Frames:
Perhaps you and I are thinking of different things?
When I think of a stack frame, I think of a call stack. Each function or method has its own stack frame consisting of the return address, arguments, and local data.
I've never seen any limitations on the size of a stack frame. There are limitations on the STACK as a whole, but that's all the stack frames combined.
There's a nice diagram and discussion of stack frames over on Wiki.
On a final note:
Under Linux/Unix/MacOS/Darwin/BSD, it is possible to change the maximum STACK size limitations programmatically as well as limit(tcsh) or ulimit(bash):
struct rlimit limits;
limits.rlim_cur = RLIM_INFINITY;
limits.rlim_max = RLIM_INFINITY;
ASSERT_IS( 0, setrlimit( RLIMIT_STACK, & limits ) );
Just don't try to set it to INFINITY on a Mac... And change it before you try to use it. ;-)
Further reading:
- http://www.informit.com/content/images/0131453483/downloads/gorman_book.pdf
- http://www.redhat.com/magazine/001nov04/features/vm/
- http://dirac.org/linux/gdb/02a-Memory_Layout_And_The_Stack.php
- http://people.redhat.com/alikins/system_tuning.html
- http://pauillac.inria.fr/~xleroy/linuxthreads/faq.html
- http://www.kegel.com/stackcheck/