views:

3187

answers:

8

I need a very accurate way to time parts of my program. I could use the regular high-resolution clock for this, but that will return wallclock time, which is not what I need: I needthe time spent running only my process.

I distinctly remember seeing a Linux kernel patch that would allow me to time my processes to nanosecond accuracy, except I forgot to bookmark it and I forgot the name of the patch as well :(.

I remember how it works though:

On every context switch, it will read out the value of a high-resolution clock, and add the delta of the last two values to the process time of the running process. This produces a high-resolution accurate view of the process' actual process time.

The regular process time is kept using the regular clock, which is I believe millisecond accurate (1000Hz), which is much too large for my purposes.

Does anyone know what kernel patch I'm talking about? I also remember it was like a word with a letter before or after it -- something like 'rtimer' or something, but I don't remember exactly.

(Other suggestions are welcome too)


The Completely Fair Scheduler suggested suggested by Marko is not what I was looking for, but it looks promising. The problem I have with it is that the calls I can use to get process time are still not returning values that are granular enough.

  • times() is returning values 21, 22, in milliseconds.
  • clock() is returning values 21000, 22000, same granularity.
  • getrusage() is returning values like 210002, 22001 (and somesuch), they look to have a bit better accuracy but the values look conspicuously the same.

So now the problem I'm probably having is that the kernel has the information I need, I just don't know the system call that will return it.

+2  A: 

If you need very small time units to for (I assume) testing the speed of your software, I would reccomend just running the parts you want to time in a loop millions of times, take the time before and after the loop and calculate the average. A nice side-effect of doing this (apart from not needing to figure out how to use nanoseconds) is that you would get more consistent results because the random overhead caused by the os sceduler will be averaged out.

Of course, unless your program doesn't need to be able to run millions of times in a second, it's probably fast enough if you can't measure a millisecond running time.

Stein G. Strindhaug
This is exactly what I do if I want to measure speed. You don't say what your goal is. If I want to find out what to optimize, that is a different goal from measurement, and needs different methods. For that, sampling the call stack is what I use.
Mike Dunlavey
+1  A: 

I believe CFC (Completely Fair Scheduler) is what you're looking for.

Marko Dumic
A: 

You can use the High Precision Event Timer (HPET) if you have a fairly recent 2.6 kernel. Check out Documentation/hpet.txt on how to use it. This solution is platform dependent though and I believe it is only available on newer x86 systems. HPET has at least a 10MHz timer so it should fit your requirements easily.

I believe several PowerPC implementations from Freescale support a cycle exact instruction counter as well. I used this a number of years ago to profile highly optimized code but I can't remember what it is called. I believe Freescale has a kernel patch you have to apply in order to access it from user space.

David Holm
A: 

http://allmybrain.com/2008/06/10/timing-cc-code-on-linux/

might be of help to you (directly if you are doing it in C/C++, but I hope it will give you pointers even if you're not)... It claims to provide microsecond accuracy, which just passes your criterion. :)

sundar
+1  A: 

I think I found the kernel patch I was looking for. Posting it here so I don't forget the link:

http://user.it.uu.se/~mikpe/linux/perfctr/ http://sourceforge.net/projects/perfctr/

Edit: It works for my purposes, though not very user-friendly.

rix0rrr
A: 

See this question for some more info.

Something I've used for such things is gettimeofday(). It provides a structure with seconds and microseconds. Call it before the code, and again after. Then just subtract the two structs using timersub, and you can get the time it took in seconds from the tv_usec field.

Will Mc
+2  A: 

If you are looking for this level of timing resolution, you are probably trying to do some micro-optimization. If that's the case, you should look at PAPI. Not only does it provide both wall-clock and virtual (process only) timing information, it also provides access to CPU event counters, which can be indispensable when you are trying to improve performance.

http://icl.cs.utk.edu/papi/

mch
+1  A: 

try the CPU's timestamp counter? Wikipedia seems to suggest using clock_gettime().

Jason S