What is the fastest timing system a C/C++ programmer can use?

For example:
time() will give the seconds since Jan 01 1970 00:00.
GetTickCount() on Windows will give the time, in milliseconds, since the system's start-up time, but is limited to 49.7 days (after that it simply wraps back to zero).

I want to get the current time, or ticks since system/app start-up time, in milliseconds.

The biggest concern is the method's overhead - I need the lightest one, because I'm about to call it many many times per second.

My case is that I have a worker thread, and to that worker thread I post pending jobs. Each job has an "execution time". So, I don't care if the time is the current "real" time or the time since the system's uptime - it just must be linear and light.


unsigned __int64 GetTickCountEx()
    static DWORD dwWraps = 0;
    static DWORD dwLast = 0;

    DWORD dwCurrent = 0;


    dwCurrent = GetTickCount();
    if(dwLast > dwCurrent)

    dwLast = dwCurrent;

    unsigned __int64 timeResult = ((unsigned __int64)0xFFFFFFFF * dwWraps) + dwCurrent;


    return timeResult;
+13  A: 

For timing, the current Microsoft recommendation is to use QueryPerformanceCounter & QueryPerformanceFrequency.

This will give you better-than-millisecond timing. If the system doesn't support a high-resolution timer, then it will default to milliseconds (the same as GetTickCount).

Here is a short Microsoft article with examples of why you should use it :)

At the first link, at clause #1, it says: "developers should attempt to have their games call QueryPerformanceCounter as little as possible to avoid any performance penalty". So, should I conclude that I better use RDTSC? It sounds like a very platform-dependent call.
Ok, read a bit more and RDTSC isn't the choise.
Additionally, MSDN recommends not to call QueryPerformanceCounter too many times. So, am I left with GetTickCount?
So, which one offers less overhead?
RDTSC will arguably be the fastest since it's just a single instruction. Both `GetTickCount` and `QueryPerformanceCounter` will be slower (and possibly to the same extent) because they need to go through all the system call overhead. But again, this overhead probably doesn't matter unless your program is absolutely time-critical.
@Poni: According to the last article, the overhead for calling it is smaller than the *resolution* of the other counters, so I wouldn't worry about that. If you read the first article you'll see `"[QueryPerformanceCounter] is an API that can be called several hundred times per frame without any noticeable impact"` -- and remember that this is 5 years ago, as well. (The first article also details why you probably don't want to use RDTSC.)
And RDTSC isn't free, either. On some architectures it can freeze *all cores* for 100s to 1000s of clock cycles. On the other architectures, each core would give a slightly different "readout". If you're running SQL server, occasionally your event logs will be peppered with complaints about RDTSC being off.
Ok, so RDTSC is definitely not the choise. @Porges: Are you sure they refer to QueryPerformanceCounter? My English skills need an improvement then (: So guys, just to conclude it, which function offers me less overhead? And yes again, I do not need a microsecond resolutiona, all I need is one that I can call many many times per second.
@Poni: quit worrying about the overhead. You are not going to call this tens of millions of times per second, and therefore, its overhead is not going to be noticeable. And if it is, you'll only know *afterwards*. So use QPC, and then if you encounter any performance problems, measure it and see *if* the overhead of QPC is the problem, and if it is, *then* you can start finding a replacement.

On Linux you get microseconds:

struct timeval tv;
int res = gettimeofday(&tv, NULL);
double tmp = (double) tv.tv_sec + 1e-6 * (double) tv.tv_usec;

On Windows, only millseconds are available:

tmp += 1e-3 * st.wMilliseconds;

return tmp;

This came from R's datetime.c (and was edited down for brevity).

Then there is of course Boost's Date_Time which can have nanosecond resolution on some systems (details here and here).

Dirk Eddelbuettel
+1  A: 

If you're just worried about GetTickCount() overflowing, then you can just wrap it like this:

DWORDLONG GetLongTickCount(void)
    static DWORDLONG last_tick = 0;
    DWORD tick = GetTickCount();

    if (tick < (last_tick & 0xffffffff))
        last_tick += 0x100000000;

    last_tick = (last_tick & 0xffffffff00000000) | tick;
    return last_tick;

If you want to call this from multiple threads you'll need to lock access to the last_tick variable. As long as you call GetLongTickCount() at least once every 49.7 days, it'll detect the overflow.


POSIX supports clock_gettime() which uses a struct timespec which has nanosecond resolution. Whether your system really supports that fine-grained a resolution is more debatable, but I believe that's the standard call with the highest resolution. Not all systems support it (MacOS X 10.6.4 does not, AFAICT), and it is sometimes well hidden (library '-lposix4' on Solaris, IIRC).

Jonathan Leffler
+2  A: 

I recently had this question and did some research. The good news is that all three of the major operating systems provide some sort of high resolution timer. The bad news is that it is a different API call on each system. For POSIX operating systems you want to use clock_gettime(). If you're on Mac OS X, however, this is not supported, you have to use mach_get_time(). For windows, use QueryPerformanceCounter. Alternatively, with compilers that support OpenMP, you can use omp_get_wtime(), but it may not provide the resolution that you are looking for.

I also found cycle.h from fftw.org (www.fftw.org/cycle.h) to be useful.

Here is some code that calls a timer on each OS, using some ugly #ifdef statements. The usage is very simple: Timer t; t.tic(); SomeOperation(); t.toc("Message"); And it will print out the elapsed time in seconds.

#ifndef TIMER_H
#define TIMER_H

#include <iostream>
#include <string>
#include <vector>

# if  (defined(__MACH__) && defined(__APPLE__))
#   define _MAC
# elif (defined(_WIN32) || defined(WIN32) || defined(__CYGWIN__) || defined(__MINGW32__) || defined(_WIN64))
#   define _WINDOWS
#   ifndef WIN32_LEAN_AND_MEAN
#     define WIN32_LEAN_AND_MEAN
#   endif

# if defined(_MAC)
#    include <mach/mach_time.h>
# elif defined(_WINDOWS)
#    include <windows.h>
# else
#    include <time.h>
# endif

#if defined(_MAC)
  typedef uint64_t timer_t;
  typedef double   timer_c;

#elif defined(_WINDOWS)
  typedef LONGLONG      timer_t;
  typedef LARGE_INTEGER timer_c;

  typedef double   timer_t;
  typedef timespec timer_c;

  // Timer
  // A quick class to do benchmarking.
  // Example: Timer t;  t.tic();  SomeSlowOp(); t.toc("Some Message");

  class Timer {

    inline void tic();
    inline void toc();
    inline void toc(const std::string &msg);

    void print(const std::string &msg);
    void print();
    void reset();
    double getTime();

    timer_t start;
    double duration;
    timer_c ts;
    double conv_factor;
    double elapsed_time;

  Timer::Timer() {

#if defined(_MAC)
    mach_timebase_info_data_t info;

    conv_factor = (static_cast<double>(info.numer))/
    conv_factor = conv_factor*1.0e-9;

#elif defined(_WINDOWS)
    timer_c freq;
    conv_factor = 1.0/(static_cast<double>freq.QuadPart);

    conv_factor = 1.0;


  inline void Timer::tic() {

#if defined(_MAC)
    start = mach_absolute_time();

#elif defined(_WINDOWS)
    start = ts.QuadPart;

    clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &ts);
    start = static_cast<double>(ts.tv_sec) + 1.0e-9 *


  inline void Timer::toc() {
#if defined(_MAC)
    duration =  static_cast<double>(mach_absolute_time() - start);

#elif defined(_WINDOWS)
    duration = static_cast<double>(qpc_t.QuadPart - start);

    clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &ts);
    duration = (static_cast<double>(ts.tv_sec) + 1.0e-9 *
                static_cast<double>(ts.tv_nsec)) - start;


    elapsed_time = duration*conv_factor;

  inline void Timer::toc(const std::string &msg) { toc(); print(msg); };

  void Timer::print(const std::string &msg) {
    std::cout << msg << " "; print();

  void Timer::print() {
    if(elapsed_time) {
      std::cout << "elapsed time: " << elapsed_time << " seconds\n";

  void Timer::reset() { start = 0; duration = 0; elapsed_time = 0; }
  double Timer::getTime() { return elapsed_time; }

#if defined(_WINDOWS)

#endif // TIMER_H
Nice, very useful for testing! As for using it in a project - I'm trying to avoid QPC because it's not 100% reliable, as some said here.

I'd suggest that you use the GetSystemTimeAsFileTime API if you're specifically targeting Windows. It's generally faster than GetSystemTime and has the same precision (which is some 10-15 milliseconds - don't look at the resolution); when I did a benchmark some years ago under Windows XP it was somewhere in the range of 50-100 times faster.

The only disadvantage is that you might have to convert the returned FILETIME structures to a clock time using e.g. FileTimeToSystemTime if you need to access the returned times in a more human-friendly format. On the other hand, as long as you don't need those converted times in real-time you can always do this off-line or in a "lazy" fashion (e.g. only convert the time stamps you need to display/process, and only when you actually need them).

QueryPerformanceCounter can be a good choice as others have mentioned, but the overhead can be rather large depending on the underlying hardware support. In my benchmark I mention above QueryPerformanceCounter calls was 25-200 times slower than calls to GetSystemTimeAsFileTime. Also, there are some reliability problems as e.g. reported here.

So, in summary: If you can cope with a precision of 10-15 milliseconds I'd recommend you to use GetSystemTimeAsFileTime. If you need anything better than that I'd go for QueryPerformanceCounter.

Small disclaimer: I haven't performed any benchmarking under later Windows versions than XP SP3. I'd recommend you to do some benchmarking on you own.

Thank you for a great answer! The first link leads to another very useful link, for your information: Implementing a High-Resolution Time Provider for Windows http://msdn.microsoft.com/en-us/magazine/cc163996.aspx

If you are targeting a late enough version of the OS then you could use GetTickCount64() which has a much higher wrap around point than GetTickCount(). You could also simply build a version of GetTickCount64() on top of GetTickCount().

Len Holgate
Did that, yet it forces me to wrap the code with a mutex, which doesn't seem exactly right. Why? Because eventually, I think, I'm using two mutexes - one of the wrapper and one of GetTickCount(). See my edit. Tell me what you think, thanks Len!!
I wouldn't do it like that. I'd have a timer to increment the high part of a ULONGLONG at the appropriate time, perhaps using interlocked increment with no lock, then simply use it in the call to GetTickCountEx(). Right now you're doing unnecessary work for 49.7 less a bit days plus synchronising all threads that use it (and that is a crit and not a mutex?) You can probably avoid the lock, or use a reader/writer lock and only lock it when you're about to update the high part... You could use my "tick shifter" API interception tool to test the code http://www.lenholgate.com/archives/000647.html
Len Holgate
Good point Len. I'll have to avoid the lock (and yes, it's a critical section). The timer-to-increase the high part is a good idea.