I'm trying to determine the reason for a stalled process on Linux. It's a telecom application, running under fairly heavy load. There is a separate process for each of 8 T1 spans. Every so often, one of the processes will get very unresponsive - up to maybe 50 seconds before an event is noted in the normally very busy process's log.
It is likely some system resource that runs short. The obvious thing - CPU usage - looks to be OK.
Which linux utilities might be best for catching and analyzing this sort of thing, and be as unobtrusive about it as possible, as this is a highly loaded system? It would need to be processes rather than system oriented, it would seem. Maybe ongoing monitoring of /proc/pid/XX? Top wouldn't seem to be too useful here.