views:

815

answers:

5

The clone() system call on Linux takes a parameter pointing to the stack for the new created thread to use. The obvious way to do this is to simply malloc some space and pass that, but then you have to be sure you've malloc'd as much stack space as that thread will ever use (hard to predict).

I remembered that when using pthreads I didn't have to do this, so I was curious what it did instead. I came across this site which explains, "The best solution, used by the Linux pthreads implementation, is to use mmap to allocate memory, with flags specifying a region of memory which is allocated as it is used. This way, memory is allocated for the stack as it is needed, and a segmentation violation will occur if the system is unable to allocate additional memory."

The only context I've ever heard mmap used in is for mapping files into memory, and indeed reading the mmap man page it takes a file descriptor. How can this be used for allocating a stack of dynamic length to give to clone()? Is that site just crazy? ;)

In either case, doesn't the kernel need to know how to find a free bunch of memory for a new stack anyway, since that's something it has to do all the time as the user launches new processes? Why does a stack pointer even need to be specified in the first place if the kernel can already figure this out?

A: 

mmap is more than just mapping a file into memory. In fact, some malloc implementations will use mmap for large allocations. If you read the fine man page you'll notice the MAP_ANONYMOUS flag, and you'll see that you need not need supply a file descriptor at all.

As for why the kernel can't just "find a bunch of free memory", well if you want someone to do that work for you, either use fork instead, or use pthreads.

Logan Capaldo
My point is that it should be able to "find a bunch of free memory" because it apparently it *already can* "find a bunch of free memory." Fork creates a new process, which is different, and I know I could abstract any detail away by using a library. But I'm giving the kernel developers credit and assuming there's good reason for things to work this way, and I want to know why.
Joseph Garvin
fork (exec really, since fork just copies everything) are the "find me a bunch of free memory" functions. `clone` is the "I want to control the details of my process creation" function. pthread_create is the "create me a thread, use the defaults" function. These are your choices. New threads need their own stack, and you can't use the traditional method of allocating stack (start at the top/bottom of the (user) address space and grown down/up towards the heap which is growing the other way), because there's only one top/bottom of the address space.
Logan Capaldo
My point is that when the user forks a process, even though that process gets its own memory space, at some level the kernel has to be doing memory management of the *physical* address space that the process-specific address spaces map into. If it can do that, it should have the logic to be able to handle letting me clone but handle the stack for me. I might want to use clone for reasons that don't have anything to do with stacks for example (see the clone flags).
Joseph Garvin
Mapping physical address space to virtual address space has nothing to do with allocating memory for the stack. I don't see why you think they're related?
Logan Capaldo
@Logan: Sure it does ;) When I launch a new process, it needs a stack. Even though the process has its own virtual address space, its stack (and therefore the beginning of its real address space) has to correspond to somewhere in the physical address space. Multiple processes with variable size stacks and heaps means the kernel has to be doing this memory management already. Unless I'm forgetting something...
Joseph Garvin
The kernel does memory management on a lower layer. You can tell it to use 100Mb as a stack. It won't use a single byte of that 100Mb(It's just virtual space after all) until you actually start using it, it'll fault in physical memory pages that's accessed. You'll use only as much memory of the stack that's needed and it'll "grow" within the size of the mmap.The bad thing ofcourse, is you need to set a fixed size stack that cannot grow. physically. Some OS's let you specify flags to mmap that allows it to grow automatically., but last I looked, which is quite some years ago, linux did not.
nos
Joseph, noselasd is correct here. Mapping virtual to physical memory (and swap) happens independently of of whether or not the memory is intended to be used a stack or heap or something else. That part of the kernel doesn't need to be aware of that distinction.
Logan Capaldo
If the kernel is smart enough to not use it unless it's used, why not always specify the maximum? (all of memory)
Joseph Garvin
@noslead: Also, I assume the situation you're describing is if you pass a pointer produced by mmap as the stack.
Joseph Garvin
@joseph, cause the virtual memory space if finite. There's e.g. shared libraries, they're mmapped into the virtual memory space. There's the executable code itself, there's the data space(global variables, malloced memory) - a somewhat special map that can be extended with the sbrk system call.And there's mmapped files that maybe the application wants to map into memory too. These mmaps cannot overlap, and they need to have different protections (read/write/exec).. Sure you could specify all available memory, but that would clash with the space needed for shared libs, and dynamic memory
nos
http://duartes.org/gustavo/blog/post/anatomy-of-a-program-in-memory probably a better overview
nos
+2  A: 

You'd want the MAP_ANONYMOUS flag for mmap. And the MAP_GROWSDOWN since you want to make use it as a stack.

Something like:

void *stack = mmap(NULL,initial_stacksize,PROT_WRITE|PROT_READ,MAP_PRIVATE|MAP_GROWSDOWN|MAP_ANONYMOUS,-1,0);

See the mmap man page for more info. And remember, clone is a low level concept, that you're not meant to use unless you really need what it offers. And it offers a lot of control - like setting it's own stack - just in case you want to do some trickering(like having the stack accessible in all the related processes). Unless you have very good reason to use clone, stick with fork or pthreads.

nos
How does this get you a dynamically growing stack though? Don't you still have to specify a length? Or do implementations like pthreads pass a gigantic length and rely on copy on write?
Joseph Garvin
Yes, they rely on copy on write. I'm not sure how big the pthread stack size is now, it used to be 2Mb by default - you can alter it with the ulimit -s command.
nos
Ok, testing with pthread_attr_getstacksize suggests the default stack size is 10485760 bytes nowadays, and
nos
@nos: I think your comment was cut off after "and".
Joseph Garvin
+1  A: 

Note that the clone system call doesn't take an argument for the stack location. It actually works just like fork. It's just the glibc wrapper which takes that argument.

agl
Are you sure? Every signature I can find online for it includes a child stack. If the system call doesn't need it why does glibc?
Joseph Garvin
A: 

Joseph, in answer to your last question:

When a user creates a "normal" new process, that's done by fork(). In this case, the kernel doesn't have to worry about creating a new stack at all, because the new process is a complete duplicate of the old one, right down to the stack.

If the user replaces the currently running process using exec(), then the kernel does need to create a new stack - but in this case that's easy, because it gets to start from a blank slate. exec() wipes out the memory space of the process and reinitialises it, so the kernel gets to say "after exec(), the stack always lives HERE".

If, however, we use clone(), then we can say that the new process will share a memory space with the old process (CLONE_VM). In this situation, the kernel can't leave the stack as it was in the calling process (like fork() does), because then our two processes would be stomping on each other's stack. The kernel also can't just put it in a default location (like exec()) does, because that location is already taken in this memory space. The only solution is to allow the calling process to find a place for it, which is what it does.

caf
A: 

Here is the code which mmaps a stack region & instructs the clone system call to use this region as the stack.

#include sys/mman.h>
#include stdio.h>
#include string.h>
#include sched.h>
int execute_clone(void *arg)

{

        printf("\nclone function Executed....Sleeping\n");
        fflush(stdout);
        return 0;
}

int main()

{
        void *ptr;

        int rc;
        void *start =(void *) 0x0000010000000000;
        size_t len = 0x0000000000200000;

        ptr = mmap(start, len, PROT_WRITE,    
                      MAP_ANONYMOUS|MAP_PRIVATE|MAP_FIXED|MAP_GROWSDOWN, 0, 0);
        if(ptr == (void *)-1) {
                perror("\nmmap failed");
        }

        rc = clone(&execute_clone, ptr + len, CLONE_VM, NULL);

        if(rc <= 0) {
                perror("\nClone() failed");
        }

}
Venkatram Tummala