views:

79

answers:

2

Firstly, I use pthread library to write multithreading c program. Threads always hung by their waited mutexs. When I use the strace utility to find a thread is in FUTEX_WAIT status, I want to know which thread hold that mutex at the time. But I don't know how could I make it. Are there any utilities could do that? Someone told me java virtual machine support this, So I want to know whether linux support this feature.

+3  A: 

I don't know of any such facility so I don't think you will get off that easily - and it probably wouldn't be as informative as you think in helping to debug your program. As low tech as it might seem, logging is your friend in debugging these things. Start collecting your own little logging functions. They don't have to be fancy, they just have to get the job done while debugging.

Sorry for the C++ but something like:

void logit(const bool aquired, const char* lockname, const int linenum)
{
    pthread_mutex_lock(&log_mutex);

    if (! aquired)
        logfile << pthread_self() << " tries lock " << lockname << " at " << linenum << endl;
    else
        logfile << pthread_self() << " has lock "   << lockname << " at " << linenum << endl;

    pthread_mutex_unlock(&log_mutex);
}


void someTask()
{
    logit(false, "some_mutex", __LINE__);

    pthread_mutex_lock(&some_mutex);

    logit(true, "some_mutex", __LINE__);

    // do stuff ...

    pthread_mutex_unlock(&some_mutex);
}

Logging isn't a perfect solution but nothing is. It usually gets you what you need to know.

Duck
Logging indeed is quite useful tool for debugging. Thanks for your suggestions.
terry
+1 Who doesn't love logging? It could be done with no code changes using LD_PRELOAD (and some patience). Wrap `pthread_mutex_*` functions with something that logged the function calls, the mutex' address, and a thread identifier (`pthread_t` happens to be an integral type on Linux, not a portable assumption but quite a convenience).
pilcrow
+4  A: 

You can use knowledge of the mutex internals to do this. Ordinarily this wouldn't be a very good idea, but it's fine for debugging.

Under Linux with the NPTL implementation of pthreads (which is any modern glibc), you can examine the __data.__owner member of the pthread_mutex_t structure to find out the thread that currently has it locked. This is how to do it after attaching to the process with gdb:

(gdb) thread 2
[Switching to thread 2 (Thread 0xb6d94b90 (LWP 22026))]#0  0xb771f424 in __kernel_vsyscall ()
(gdb) bt
#0  0xb771f424 in __kernel_vsyscall ()
#1  0xb76fec99 in __lll_lock_wait () from /lib/i686/cmov/libpthread.so.0
#2  0xb76fa0c4 in _L_lock_89 () from /lib/i686/cmov/libpthread.so.0
#3  0xb76f99f2 in pthread_mutex_lock () from /lib/i686/cmov/libpthread.so.0
#4  0x080484a6 in thread (x=0x0) at mutex_owner.c:8
#5  0xb76f84c0 in start_thread () from /lib/i686/cmov/libpthread.so.0
#6  0xb767784e in clone () from /lib/i686/cmov/libc.so.6
(gdb) up 4
#4  0x080484a6 in thread (x=0x0) at mutex_owner.c:8
8               pthread_mutex_lock(&mutex);
(gdb) print mutex.__data.__owner
$1 = 22025
(gdb)

(I switch to the hung thread; do a backtrace to find the pthread_mutex_lock() it's stuck on; change stack frames to find out the name of the mutex that it's trying to lock; then print the owner of that mutex). This tells me that the thread with thread id 22025 is the culprit.

caf
@caf Is there a way to correlated __data__.__owner__ with pthread thread id? In playing with this I simply coded log << mutex.__data__.owner << endl and that appears to work fine. But the data.owner is a value like 9841 while the tid is like 140505876686608. What is the relationship between the two values?
Duck
@Duck: The value in `.__data.__owner` is a TID. When each thread starts you could just have them log their TID (using `tid = syscall(SYS_gettid);`) as well as their `pthread_t` (from `pthread_self()`).
caf
@caf - Excellent. Thanks.
Duck