views:

539

answers:

3

I am writing a GUI oriented debugger which targets Linux primarily, but I plan ports to other OSes in the future. Because the GUI must stay interactive at all times, I have a few threads handling different things.

Primarily I have a "debug event" thread which simply loops waiting for waitpid to return and delivers the received events to the other threads. I do this because waitpid does not have a timeout, which makes it very hard to integrate it with other event loops and keep things responsive (waitpid can hang indefinitely!).

This strategy has worked wonderfully for the Linux builds so far. Lately I've been trying to make my debugger thread aware (as in the threads in the debugged application, not the debugger itself).

So I set the ptrace options to follow clone events and look for a status which has the upper 16-bit set to PTRACE_EVENT_CLONE. Then I use PTRACE_GETEVENTMSG to get the TID of the new thread. This all works nicely in my small test harness applications. But for some reason, it is failing when i put that code in my actual debugger. (I get a "No such process" error code)

The one thing that occurred to me is that Windows has a rule that only the thread which attached to an application can listen for debug events. Does Linux's ptrace have a similar limitation? If so, why does my code work for other debug events?

EDIT:

It seems that at the very least waitpid supports waiting from a different thread, the man page says:

Before Linux 2.4, a thread was just a special case of a process, and as a consequence one thread could not wait on the children of another thread, even when the latter belongs to the same thread group. However, POSIX prescribes such functionality, and since Linux 2.4 a thread can, and by default will, wait on children of other threads in the same thread group.

So at most this is a ptrace limitation.

A: 

Is there a compelling reason to re-implement your own debugger? Take a look at gdb, and specifically, the gdbserver/gdbclient architecture. If you want a richer GUI for the debug environment, then you could code on top of the gdbclient protocol, and provide a better user interface than the gdb commandline, and never have to worry about issues like this.

slacy
because I don't feel Linux needs yet another GDB front end. GDB has a focus on debugging applications to which you have the source code, mine focuses more on reverse engineering. It is much closer to ollydbg design and usage wise, it actually has been progressing quite nicely and is quite functional: http://www.codef00.com/projects.php#Debugger
Evan Teran
of course GBD *can* debug binaries, but something more focused on binary level analysis better suites the needs of a reverse engineer.
Evan Teran
you didn't really answer the question...
+1  A: 

As far as I can tell, this is not allowed. A task cannot use ptrace on a task which it has not attached. Also, a task can be traced by at most one other task, so you can't simply attach it once in each thread. I think this is because when one task attaches to another task, the tracing task becomes the parent of the traced task, and each task can only have one parent.

It seems like multi-thread tracing ought to be allowed because the threads are part of the same process, but implementation-wise, there isn't actually much distinction between threads and processes in the Linux kernel. A thread is just a task that happens to share most of its resources with another task.

If you're interested, you can browse the source code for ptrace in the kernel. Specifically look at ptrace_check_attach, which is called by sys_ptrace for most requests. It returns -ESRCH (sounds like the error code you're getting) if the target task's parent is not the current task.

Jay Conrod
yes, this is the feeling that I get as well. Interestingly enough, the wait family does allow a thread to wait on children of other threads. So it seems that it is doable (just use similar logic to wait). It does look like line 94 in ptrace.c is the culprit. Thanks!
Evan Teran
+1  A: 

I had the same issue (plus many others!) while implementing the Linux-specific part of the Maxine VM debugger. You are correct in your guess that only one thread in the debugger can use ptrace to control the debuggee. We accomplish this by making all calls to ptrace on a dedicated thread. You may find it useful to look at the LinuxTask.java, linuxTask.h and linuxTask.c files in the Maxine sources available at kenai.com/projects/maxine/sources/maxine/show

Doug Simon
thanks, I'll check it out.
Evan Teran