views:

971

answers:

5

Hi,

I want to know if there is an efficient solution to monitor a process resource consumption (cpu, memory, network bandwidth) in Linux. I want to write a daemon in C++ that does this monitoring for some given PIDs. From what I know, the classic solution is to periodically read the information from /proc, but this doesn't seem the most efficient way (it involves many system calls). For example to monitor the memory usage every second for 50 processes, I have to open, read and close 50 files (that means 150 system calls) every second from /proc. Not to mention the parsing involved when reading these files.

Another problem is the network bandwidth consumption: this cannot be easily computed for each process I want to monitor. The solution adopted by NetHogs involves a pretty high overhead in my opinion: it captures and analyzes every packet using libpcap, then for each packet the local port is determined and searched in /proc to find the corresponding process.

Do you know if there are more efficient alternatives to these methods presented or any libraries that deal with this problems?

+2  A: 

/usr/src/linux/Documentation/accounting/taskstats.txt

Taskstats is a netlink-based interface for sending per-task and per-process statistics from the kernel to userspace.

Taskstats was designed for the following benefits:

  • efficiently provide statistics during lifetime of a task and on its exit
  • unified interface for multiple accounting subsystems
  • extensibility for use by future accounting patches

This interface lets you monitor CPU, memory, and I/O usage by processes of your choosing. You only need to set up and receive messages on a single socket.

This does not differentiate (for example) disk I/O versus network I/O. If that's important to you, you might go with a LD_PRELOAD interception library that tracks socket operations. Assuming that you can control the startup of the programs you wish to observe and that they won't do trickery behind your back, of course.

I can't think of any light-weight solutions if those still fail, but linux-audit can globally trace syscalls, which seems a fair bit more direct than re-capturing and analyzing your own network traffic.

ephemient
taskstats contains only disk I/O, not both net and disk
tuxx
Correction: taskstats monitors only read/write syscalls, and not recv/send and friends (but this can be easily modified inside the kernel). Thanks, anyway. This seems the best solution so far.
tuxx
A: 

Take a look at the linux trace toolkit (LTTng). It inserts tracepoints into the kernel and has some post processing to get some of the kind of statistics you're asking about. The trace files get large if you capture everything, but you can keep things manageable if you limit the types of events you arm.

http://lttng.org for more info...

Matt Brandt
A: 

Regarding network bandwidth: This Superuser answer describes processing /proc/net/tcp to collect network bandwidth usage.

I know that iptables can be used to do network accounting (see, e.g., LWN's, Linux.com's, or Shorewall's articles), but I don't see any practical way to do accounting that on a per-process basis.

Josh Kelley
A: 

Reading /proc is ultimately the only way to monitor CPU and memory usage by individual processes without injecting your code into the kernel. If you look at top(1), you'll see reading lots of files in /proc is exactly what it does every second. All user-mode tools and libraries that retrive this sort of information have to get it from /proc.

As with network bandwidth usage, there are several approaches, which all more or less boil down to capturing all network traffic in and out of the box. You can also consider writing a special netfilter (iptables) module that does exactly the type of counting you need without the overhead of traffic capturing.

Alexey Feldgendler
A: 

i just came across this as i was looking for answers to the same thing. just a note - when using /proc filesystem, you do not have to close the file after each read. you can keep the file open and each time you do a read you will get new statistics... so, you shouldn't have the overhead of opening and closing each time you want to get the stats... i have this working in javascript on node.js if you want an example...

billywhizz