tags:

views:

14673

answers:

8

I wish to calculate the time it took for an API to return a value. The time taken for such an action is in the space of nano seconds. As the API is a C++ class/function, I am using the timer.h to caculate the same:

  #include <ctime>
  #include <cstdio>

  using namespace std;

  int main(int argc, char** argv) {

  clock_t start;
  double diff;
  start = clock();
  diff = ( std::clock() - start ) / (double)CLOCKS_PER_SEC;
  cout<<"printf: "<< diff <<'\n';

  return 0;
  }

The above code gives the time in seconds, I wish to get the same in nano seconds and with more precision.

+3  A: 

In general, for timing how long it takes to call a function, you want to do it many more times than just once. If you call your function only once and it takes a very short time to run, you still have the overhead of actually calling the timer functions and you don't know how long that takes.

For example, if you estimate your function might take 800 ns to run, call it in a loop ten million times (which will then take about 8 seconds). Divide the total time by ten million to get the time per call.

Greg Hewgill
actualyy, i am trying to get the performance of the api for a particular call. for each run, it might give a different time, this may effect the graph i make for the performance improvement... hence the time in nano seconds. but yeah, this is a great idea, will consider it.
gagneet
+6  A: 

With that level of accuracy, it would be better to reason in CPU tick rather than in system call like clock(). And do not forget that if it takes more than one nanosecond to execute an instruction... having a nanosecond accuracy is pretty much impossible.

Still, something like that is a start:

Here's the actual code to retrieve number of 80x86 CPU clock ticks passed since the CPU was last started. It will work on Pentium and above (386/486 not supported). This code is actually MS Visual C++ specific, but can be probably very easy ported to whatever else, as long as it supports inline assembly.

inline __int64 GetCpuClocks()
{

    // Counter
    struct { int32 low, high; } counter;

    // Use RDTSC instruction to get clocks count
    __asm push EAX
    __asm push EDX
    __asm __emit 0fh __asm __emit 031h // RDTSC
    __asm mov counter.low, EAX
    __asm mov counter.high, EDX
    __asm pop EDX
    __asm pop EAX

    // Return result
    return *(__int64 *)(&counter);

}

This function has also the advantage of being extremely fast - it usually takes no more than 50 cpu cycles to execute.

Using the Timing Figures:
If you need to translate the clock counts into true elapsed time, divide the results by your chip's clock speed. Remember that the "rated" GHz is likely to be slightly different from the actual speed of your chip. To check your chip's true speed, you can use several very good utilities or the Win32 call, QueryPerformanceFrequency().

VonC
thanks for the information, this is useful. i did not think of the cpu cycles to compute the time, i think that is a very good point to keep in mind :-)
gagneet
Using QueryPerformanceFrequency() to turn TSC counts into elapsed time may not work. QueryPerformanceCounter() uses the HPET (High Precision Event Timer) on Vista when available. It uses the ACPI power management timer if the user adds /USEPMTIMER to boot.ini.
bk1e
+2  A: 

If you need subsecond precision, you need to use system-specific extensions, and will have to check with the documentation for the operating system. POSIX supports up to microseconds with gettimeofday, but nothing more precise since computers didn't have frequencies above 1GHz.

If you are using Boost, you can check boost::posix_time.

Raymond Martineau
want to keep the code portable, will see the boost library and check if i can bundle this with the code. thanks :-)
gagneet
A: 

If this is for Linux, I've been using the function "gettimeofday", which returns a struct that gives the seconds and microseconds since the Epoch. You can then use timersub to subtract the two to get the difference in time, and convert it to whatever precision of time you want. However, you specify nanoseconds, and it looks like the function clock_gettime() is what you're looking for. It puts the time in terms of seconds and nanoseconds into the structure you pass into it.

Will Mc
clock_gettime() should do the trick for now. will try using the same for my purpose...
gagneet
+12  A: 

What others have posted about running the function repeatedly in a loop is correct.

For Linux (and BSD) you want to use clock_gettime().

#include <sys/time.h>

int main()
{
   timespec ts;
   // clock_gettime(CLOCK_MONOTONIC, &ts); // Works on FreeBSD
   clock_gettime(CLOCK_REALTIME, &ts); // Works on Linux
}

For windows you want to use the QueryPerformanceCounter. And here is more on QPC

Apparently there is a known issue with QPC on some chipsets, so you may want to make sure you do not have those chipset. Additionally some dual core AMDs may also cause a problem. See the second post by sebbbi, where he states:

QueryPerformanceCounter() and QueryPerformanceFrequency() offer a bit better resolution, but have different issues. For example in Windows XP, all AMD Athlon X2 dual core CPUs return the PC of either of the cores "randomly" (the PC sometimes jumps a bit backwards), unless you specially install AMD dual core driver package to fix the issue. We haven't noticed any other dual+ core CPUs having similar issues (p4 dual, p4 ht, core2 dual, core2 quad, phenom quad).

grieve
thanks, i think this will serve my purpose for now... :-)
gagneet
I've seen TSC clock skew on an older dual Xeon PC, but not nearly as bad as on an Athlon X2 with C1 clock ramping enabled. With C1 clock ramping, executing a HLT instruction slows down the clock, causing the TSC on idle cores to increment more slowly than on active cores.
bk1e
CLOCK_MONOTONIC works on the versions of Linux I have avalaible.
Bernard
@Bernard - That must be newly added since I last looked at this. Thanks for the heads up.
grieve
+1  A: 

You can use the following function with gcc running under x86 processors:

unsigned long long rdtsc()
{
  #define rdtsc(low, high) \
         __asm__ __volatile__("rdtsc" : "=a" (low), "=d" (high))

  unsigned long low, high;
  rdtsc(low, high);
  return ((ulonglong)high << 32) | low;
}

with Digital Mars C++:

unsigned long long rdtsc()
{
   _asm
   {
        rdtsc
   }
}

which reads the high performance timer on the chip. I use this when doing profiling.

Walter Bright
this is useful, i will check if the processor is x86, as i am using a apple mac for experimentation... thanks :-)
gagneet
What values is the user supposed to give for high and low? Why do you define a macro inside the body of a function? Also, ulonglong, presumably typedef'd to unsigned long long, isn't a standard type. I'd like to use this but I'm not sure how ;)
Joseph Garvin
unsigned long is not the right thing to use under linux. You may want to consider using int instead as long and long long are both 64-bit on 64-bit Linux.
Marius
+4  A: 

I am using the following to get the desired results:

#include <time>
#include <iostream>
using namespace std;

int main (int argc, char** argv) {
    // reset the clock
    timespec tS;
    tS.tv_sec = 0;
    tS.tv_nsec = 0;
    clock_settime(CLOCK_PROCESS_CPUTIME_ID, &tS);
    ...
    ... <code to check for the time to be put here>
    ...
    clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &tS);
    cout << "Time taken is: " << tS.tv_sec, tS.tv_nsec << endl;

    return 0;
}
gagneet
+1  A: 

To do this correctly you can use one of two ways, either go with RDTSC or with clock_gettime(). The second is about 2 times faster and has the advantage of giving the right absolute time. Note that for RDTSC to work correctly you need to use it as indicated (other comments on this page have errors, and may yield incorrect timing values on certain processors)

inline uint64_t rdtsc()
{
    uint32_t lo, hi;
    __asm__ __volatile__ (
      "xorl %%eax, %%eax\n"
      "cpuid\n"
      "rdtsc\n"
      : "=a" (lo), "=d" (hi)
      :
      : "%ebx", "%ecx" );
    return (uint64_t)hi << 32 | lo;
}

and for clock_gettime: (I chose microsecond resolution arbitrarily)

#include <time.h>
#include <sys/timeb.h>
// needs -lrt (real-time lib)
// 1970-01-01 epoch UTC time, 1 mcs resolution (divide by 1M to get time_t)
uint64_t ClockGetTime()
{
    timespec ts;
    clock_gettime(CLOCK_REALTIME, &ts);
    return (uint64_t)ts.tv_sec * 1000000LL + (uint64_t)ts.tv_nsec / 1000LL;
}

the timing and values produced:

Absolute values:
rdtsc           = 4571567254267600
clock_gettime   = 1278605535506855

Processing time: (10000000 runs)
rdtsc           = 2292547353
clock_gettime   = 1031119636
Marius