views:

36

answers:

2

I have a daemon process which does the configuration management. all the other processes should interact with this daemon for their functioning. But when I execute a large action, after few hours the daemon process is unresponsive for 2 to 3 hours. And After 2- 3 hours it is working normally.

Debugging utilities for Linux process hang issues?

How to get at what point the linux process hangs?

+1  A: 

One option is to use gdb and use the attach command in order to attach to a running process. You will need to load a file containing the symbols of the executable in question (using the file command)

Yann Ramin
You can also do this straight from the command line with `gdb /path/to/executable PID`
R Samuel Klatchko
+2  A: 
  • strace can show the last system calls and their result
  • lsof can show open files
  • the system log can be very effective when log messages are written to track progress. Allows to box the problem in smaller areas. Also correlate log messages to other messages from other systems, this often turns up interesting results
  • wireshark if the apps use sockets to make the wire chatter visible.
  • ps ax + top can show if your app is in a busy loop, i.e. running all the time, sleeping or blocked in IO, consuming CPU, using memory.

Each of these may give a little bit of information which together build up a picture of the issue.

When using gdb, it might be useful to trigger a core dump when the app is blocked. Then you have a static snapshot which you can analyze using post mortem debugging at your leisure. You can have these triggered by a script. The you quickly build up a set of snapshots which can be used to test your theories.

Peter Tillemans
I don't think `ps ax` is a good utility for detecting busy loops, `top` would do better for this
Dmitry Yudakov
You are right of course, I tend to use these almost always side by side, so they got mixed up in my head. thanks, I updated the answer
Peter Tillemans
Hi Peter,strace helped me to troubleshoot the hang issue.The hang is because of a deadlock scenario between two processes for a file lock using 'flock()'.Thank you very much for the help. Without strace it is really an impossible task to find out why the process is hanging.
Niranjan
I am glad my answer was helpful. You might consider accepting the answer then your acceptance ratio goes up which makes people more likely to help you, and I get the 15 points of course ;-).
Peter Tillemans