Hi, I have a server-client program in which there are multiple threads in both the server and client. There are variable number of clients and servers (like 3 servers (replicas), 10 clients). I am debugging a source file in this program. I think there is some kind of deadlock, possibly the following:
A mutex lock is already held by a server method and a request from the client invokes a server method which wants to acquire the mutex again.
The program is launched by a test script which spawns the servers and clients and makes the client send specific requests to the servers. I have used the following code in the suspicious area of code to see if there is a deadlock, but it doesnt seem to work, ie the code enters neither block:
if (pthread_mutex_lock(&a_mutex) == EDEADLK) {
cout<<"couldnt acquire lock."<<endl;
}
else cout<<"acquired lock"<<endl;
I tried to debug (by attaching one running server process) with gdb. I added "display" and "watch" (in different runs of gdb) for a_mutex. I get a result of the following form:
1: a_mutex = {__data = {__lock = 2, __count = 0, __owner = 4193, __kind = 0, __nusers = 2,
{__spins = 0, __list = {__next = 0x0}}},
__size = "\002\000\000\000\000\000\000\000a\020\000\000\000\000\000\000\002\000\000 \000\000\000\000", __align = 2}
I dont know the meaning of all the things in the above output, but I could see that a thread (4193) is holding the mutex. I saw the backtrace of that thread (snipped):
#0 0xb8082430 in __kernel_vsyscall ()
#1 0xb7e347a6 in nanosleep () from /lib/tls/i686/cmov/libc.so.6
#2 0xb7e345be in sleep () from /lib/tls/i686/cmov/libc.so.6
#3 0x0804cb59 in class1::method1 (this=0xbfa9fe6c, clt=1, id=
{static npos = 4294967295, _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, _M_p = 0xb7c9c11c "l/%\b"}})
at file1.cc:33
I dont know how and where the bug is.
I would highly appreciate any help with the following questions:
- What is a good method of debugging such conditions/programs?
- How do I detect the deadlock condition (ie where a lock is being held and not released)?
- In such a multi-process program, is there a better way of using gdb? (ie inspecting states in all processes? configuring gdb to watch/display a variable before the start of the process?)
- Because, when I attach gdb with the server after it has started (by the tester script), the server might have already advanced ahead of the code which I want to inspect. I tried adding a sleep(20) before the suspicious area to help me with gdb, but I think this is not a good way. I also think that opening multiple terminals, starting servers and client manually and checking states of each of them is also not a very good idea (please correct me if i am wrong).
PS: I have read this question already.
Thank you very much.