We have a qthreads-based workflow engine where worker threads pick up bundles of input as they are placed on a queue, then place their output on another queue for other worker threads to run the next stage; and so on until all the input has been consumed and all the output has been generated.
Typically, several threads will be running the same task and others will be running other tasks at the same time. We want to benchmark performance of these threaded tasks in order to target optimization efforts.
It's easy to get the real (elapsed) time that a given thread, running a given task, has taken. We just look at the difference between the return values of the POSIX times() function at the start and end of the thread's run() procedure. However, I cannot figure out how to get the corresponding user and system time. Getting these from the struct tms
that you pass to times() doesn't work, because this structure gives total user and system times of all threads running while the thread in question is active.