ansaurus

Question

Interaction of fork and user-space memory mapped in the kernel

Answer 1

+1 A:

A fork() will not interfere with get_user_pages(): get_user_pages() will give you a struct page.

You would need to kmap() it before being able to access it, and this mapping is done in kernel space, not userspace.

EDIT: get_user_pages() touch the page table, but you should not be worried about this (it just make sure that the pages are mapped in userspace), and returns -EFAULT if it had any problem doing so.

If you fork(), until copy-on-write is performed, the child will be able to see that page. Once copy-on-write is done (because the child/the driver/the parent wrote to the page through the userspace mapping -- not the kernel kmap() the driver has), that page will no longer be shared. If you still hold a kmap() on the page (in the driver code), you will not be able to know if you are holding the parent page or the child's.

1) It's not a security hole, because once you execve(), all of that is gone.

2) When you fork() you want both process to be identical (It's a fork !!). I would think that your design should allow both the parent and the child to access the driver. Execve() will flush everything.

What about adding some functionality in userspace like:

 f = open("/dev/your_thing")
 mapping = mmap(f, ...)

When mmap() is called on your device, you install a memory mapping, with special flags: http://os1a.cs.columbia.edu/lxr/source/include/linux/mm.h#071

You have some interesting things like:

#define VM_SHARED       0x00000008
#define VM_LOCKED       0x00002000
#define VM_DONTCOPY     0x00020000      /* Do not copy this vma on fork */

VM_SHARED will disable copy on write VM_LOCKED will disable swapping on that page VM_DONTCOPY will tell the kernel not to copy the vma region on fork, although I don't think it's a good idea

Nicolas Viennot 2010-10-28 19:20:33

Thanks for this interesting answer. I'm taking up the maintenance of existing code and just starting with Linux kernel programming, and hadn't caught on `kmap` as being relevant. I don't really care if the child can't access our driver; if the process forks, it would be for an unrelated purpose like `popen`. I don't control how the memory is allocated in userspace (we even have some code that passes a buffer on the stack to our driver). Is there a way for the *driver* to say “I want this physical page to remain mapped in the parent” (no matter how the parent obtained the page)?

Gilles 2010-10-28 20:35:19

you can use the syscall `mlock()` which will basically add a VM_LOCKED flag on the targeted vma.

Nicolas Viennot 2010-10-28 21:31:52

Answer 2

A:

The short answer is to use madvise(addr, len, MADV_DONTFORK) on any userspace buffers you give to your driver. This tells the kernel that the mapping should not be copied from parent to child and so there is no CoW.

The drawback is that the child inherits no mapping at that address, so if you want the child to then start using the driver it will need to remap that memory. But that is fairly easy to do in userspace.

Update: A buffer on the stack is problematic, I'm not sure you can make it safe in general.

You can't mark it DONTFORK, because your child might be running on that stack page when it forks, or (worse in a way) it might do a function return later and hit the unmapped stack page. (I even tested this, you can happily mark your stack DONTFORK, bad things happen when you fork).

The other way to avoid a CoW is to create a shared mapping, but you can't map your stack shared for obvious reasons.

That means you risk a CoW if you fork. Even if the child "just" execs it might still touch the stack page and cause a CoW, leading to the parent getting a different page, which is bad.

The one minor point in your favor is that code using an on-stack buffer only needs to worry about code it calls forking, ie. you can't use an on-stack buffer after the function has returned. So you only need to audit your callees, and if they never fork you're safe, but that still may be infeasible, and is fragile if the code ever changes.

I think you really want to have all memory that is given to your driver to come from a custom allocator in userspace. It shouldn't be that intrusive. The allocator can either mmap your device directly, as the other answer suggested, or just use anonymous mmap, madvise(DONTFORK), and probably mlock() to avoid swap out.

mpe 2010-10-28 22:46:42

@mpe: Actually I was already aware of `MADV_DONTFORK`, but it would be a restriction compare to what we do now (we have code that uses a buffer on the stack). I should have mentioned this in my question. If you have a long answer, I'd be interested to read it.

Gilles 2010-10-28 22:52:30

ansaurus

tags:

views:

answers:

Interaction of fork and user-space memory mapped in the kernel

related questions