views:

335

answers:

3

I'm writing a Linux application which observes other applications and tracks consumption of resources . I'm planning work with Java, but programming language isn't important for me. The Goal is important, so I can switch to another technology or use modules. My application runs any selected third party application as child process. Mostly child software solves some algorithm like graphs, string search, etc. Observer program tracks child's resources while it ends the job.

If child application is multi-threaded, maybe somehow is possible to track how much resources consumes each thread? Application could be written using any not distributive-memory threads technology: Java threads, Boost threads, POSIX threads, OpenMP, any other.

+3  A: 

In modern Linux systems (2.6), each thread has a separate identifier that has nearly the same treatment as the pid. It is shown in the process table (at least, in htop program) and it also has its separate /proc entry, i.e. /proc/<tid>/stat.

Check man 5 proc and pay particular attention to stat, statm, status etc. You should find the information you're interested in there.

An only obstacle is to obtain this thread identifier. It is different with the process id! I.e. getpid() calls in all threads return the same value. To get the actual thread identifier, you should use (within a C program):

pid_t tid = syscall(SYS_gettid);

By the way, java virtual machine (at least, its OpenJDK Linux implementation) does that internally and uses it for debugging purposes in its back-end, but doesn't expose it to the java interface.

Pavel Shved
+1  A: 

Memory is not allocated to threads, and often shared across threads. This makes it generally impossible and at least meaningless to talk about the memory consumption of a thread.

An example could be a program with 11 threads; 1 creating objects and 10 using those objects. Most of the work is done on those 10 threads, but all memory was allocated on the one thread that created the objects. Now how does one account for that?

MSalters
I can't say that you are not right. But we can assume that single thread memory usage is amount of memory with which specific thread works at current time. Mostly threads locks memory with semophores or mutexes while works with data.
Pawka
Sorry, but that still doesn't make sense. Read-only memory needs no lock at all. When a mutex is used, there's no reasonable way to determine what memory it protects without going to the code.
MSalters
I'm not talking about read only memory. We still can count memory consumption of thread. For example working with some graph, creating nodes for some counting etc. Each this data could be accessible by one thread, created by itself and cleaned after work is done.
Pawka
So, when a thread would be working with a node, is the node counted against the memory used? Or is the memory of the entire graph counted towards the memory used by the thread? I'm not saying that there are **no** allocations specific to a thread. That's simply not sufficient. In many real-world programs, you are likely to find that 90% of all memory allocated to a process cannot be uniquely accounted to any of its threads. End result: process uses 50 MB, each of its 10 threads "uses" 1 MB, and 40 MB is "missing". Your results will not be trusted because of this.
MSalters
+1  A: 

If you're willing to use Perl take a look at this: Sys-Statistics-Linux

I used it together with some of the GD graphing packages to generate system resource usage graphs for various processes.

One thing to watch out for - you'll really need to read up on /proc and understand jiffies - last time I looked they're not documented correctly in the man pages, you'll need to read kernel source probably:

http://lxr.linux.no/#linux+v2.6.18/include/linux/jiffies.h

Also, remember that in Linux the only difference between a thread and process is that threads share memory - other than that they're identical in how the kernel implements them.

Robert S. Barnes