views:

326

answers:

1

I'm working on a memory tracking library where we use mprotect to remove access to most of a program's memory and a SIGSEGV handler to restore access to individual pages as the program touches them. This works great most of the time.

My problem is that when the program invokes a system call (say read) with memory that my library has marked no access, the system call just returns -1 and sets errno to EFAULT. This changes behavior of the programs being tested in strange ways. I would like to be able to restore access to each page of memory given to a system call before it actually goes to the kernel.

My current approach is to create a wrapper for each system call that touches memory. Each wrapper would touch all the memory given to it before handing it off to the real system call. It seems like this will work for calls made directly from the program, but not for those made by libc (for instance, fread will call read directly without using my wrapper). Is there any better approach? How is it possible to get this behavior?

+3  A: 

You can use ptrace(2) to achieve this. It allows you to monitor a process and get told whenever certain events occur. For your purposes, look at PTRACE_SYSCALL which allows you to stop the process upon syscall entry and exit.

You will have to change some of your memory tracking infrastructure, however, as ptrace operates such that a parent process monitors a child process, and as far as the child is concerned it doesn't have visibility of when a monitored event occurs. Having said that, you should be able to do something along the lines of:

  • Setup ptrace parent and child, monitoring (at least) PTRACE_SYSCALL.
  • Child process does a syscall; and parent is notified.
  • Parent saves the requested syscall info; and uses PTRACE_GETREGS and PTRACE_SETREGS to change child state so instead of calling the syscall; the child process calls the 'memory unprotect' routine.
  • Child unprotect's it's memory; then raises SIGUSR1 or similar to tell controlling parent that the memory work is complete.
  • Parent catches SIGUSR, uses PTRACE_SETREGS to restore the previouly-saved syscall info and resumes the child.
  • Child resumes and executes the orignal syscall.
Dave Rigby
http://lkml.org/lkml/2008/8/25/40 There's been some experiments to allow ptrace on yourself, to catch your own syscalls... but yeah, this multi-process dance would generally be necessary.
ephemient
I think this is the right solution, but it won't work as specified. When I stop the child at a system call, the value of eip is at the system call return point (popping registers off stack in __kernel_vsyscall) rather than at the sysenter instruction. So the system call has already started (and is doomed to fail) when the parent catches it. I can probably catch it when it returns though, invoke the un-protecting routine, then restart the call.
Jay Conrod
@Jay: Any chance you could post (a minimal version of) your code? You should be able to catch the child before it actually performs the syscall...
Dave Rigby
@Dave Rigby, I posted some sample code I was working on earlier to http://www.jayconrod.com/scratch/ptrace_test.c. I'm using GDB to inspect contents of the user_info struct. I believe the first time the program stops with SIGTRAP, it's entering the write call. I then use the disassemble command with the value from user_info.regs.eip. The instruction is "pop %ebp", and the address is __kernel_vsyscall+16 (the return point). Thanks for your continued help.
Jay Conrod
@Jay: I ran the test code you linked, it appears to work for me - I see the first SIGTRAP occur on the start of SYS_write.
Dave Rigby
@Dave Rigby, You were completely right. I was able to redirect the program both before and after system calls. My current solution is to wait until after the system call, look for the EFAULT error code, and have the program jump into some fixup code. The fixup code restores the registers (the kernel blows away ecx, edx, and ebp but __kernel_vsyscall saves them), calls a routine that reads from the appropriate pages (to generate the SIGSEGV), then jumps back to __kernel_vsyscall to restart.
Jay Conrod
@Jay: Great - glad it worked for you!
Dave Rigby