Stack sizes
Typically, each thread has a fixed stack when the thread is created. Run ulimit -a
to see the default stack size for your system. On my system, it is 8 MiB. When you create new threads, you can give them smaller or larger stacks (see pthread_attr_setstacksize
).
When the stack grows beyond 8 MiB, the program will write to an invalid memory location and crash. The kernel makes sure that the memory locations next to the stack are all invalid, to ensure that programs crash when their stacks overflow.
You may think that a fixed size is a waste, but that 8 MiB is virtual memory and not physical memory. The difference is important, see below.
Malloc
On Unix systems, memory allocation has two layers to it. The user-space layer is malloc
(and calloc
, realloc
, free
). This is just part of the C library, and can be replaced with your own code -- Firefox does this, and many programming languages use their own allocation scheme other than malloc
. Various malloc
implementations are cross-platform.
The lower layer is mmap
(and sbrk
). The mmap
function is a system call which alters your program's address space. One of the things it can do is add new anonymous, private pages into your program's memory.
The purpose of malloc
is to get large chunks of virtual memory from the kernel using mmap
(or sbrk
) and divide them up efficiently for your program. The mmap
system call only works in multiples of 4 KiB (on most systems).
Memory: virtual versus real
Remember that the stack and all of the memory returned by mmap
is just virtual memory, not physical RAM. The kernel doesn't allocate physical RAM to your process until you actually use it.
When you get anonymous memory from the kernel, either on the heap or the stack, it's filled with zeroes. Instead of giving you hundreds of pages of physical RAM pre-filled with zeroes, however, the kernel makes all of that virtual memory share one single page of physical RAM. The virtual memory is marked read only. As soon as you write to it, the CPU throws an exception, transfers control to the kernel, and the kernel allocates a fresh, writable, zeroed page for your program.
This explains why:
calloc
is faster than malloc
+ memset
(because calloc
knows that the mmap
'd pages are pre-zeroed, and memset
forces the allocation of physical RAM)
- You can allocate much more memory than combined RAM + swap (because it's not used until you write to it)