views:

288

answers:

9

I want to calculate performance of a function in micro second precision on Windows platform.

Now Windows itself has milisecond granuality, so how can I achieve this.

I tried following sample, but not getting correct results.

LARGE_INTEGER ticksPerSecond = {0};
LARGE_INTEGER tick_1 = {0};
LARGE_INTEGER tick_2 = {0};
double uSec = 1000000;

// Get the frequency
QueryPerformanceFrequency(&ticksPerSecond);

//Calculate per uSec freq
double uFreq = ticksPerSecond.QuadPart/uSec;

// Get counter b4 start of op
QueryPerformanceCounter(&tick_1);

// The ope itself
Sleep(10);

// Get counter after opfinished
QueryPerformanceCounter(&tick_2);

// And now the op time in uSec
double diff = (tick_2.QuadPart/uFreq) - (tick_1.QuadPart/uFreq);
+19  A: 

Run the operation in a loop a million times or so and divide the result by that number. That way you'll get the average execution time over that many executions. Timing one (or even a hundred) executions of a very fast operation is very unreliable, due to multitasking and whatnot.

Matti Virkkunen
Note: If the operation is fast enough, the loop itself might be taking up a portion of the time you are concerned about. Keep this in mind as a concern.
Brian
caching will take effect and speed up the operation if you loop over it
PiNoYBoY82
Looping 1M times is nothing compared to the seconds it's going to run. Different story if you want to measure nanoseconds. Oh and just use a stopwatch ;-)
phkahler
A: 

If you're doing this for offline profiling, a very simple way is to run the function 1000 times, measure to the closest millisecond and divide by 1000.

Wim
+7  A: 
  • compile it
  • look at the assembler output
  • count the number of each instruction in your function
  • apply the cycles per instruction on your target processor
  • end up with a cycle count
  • multiply by the clock speed you are running at
  • apply arbitrary scaling factors to account for cache misses and branch mis-predictions lol

(man I am so going to get down-voted for this answer)

vicatcu
Not downvoting, I'll just note that the last line (cache misses and branch mis-predictions) pretty much screws the very careful count of cpu cycles you had obtained so far :p
Matthieu M.
+1 funny. However, if you're serious, this is terrible advice. As sarcasm it's a very good example of why to chose Matti answer.
caspin
Downvoting? Not by me. I used to do that, actually. However, with cacheing now it doesn't really work. So I recommend the run-it-a-million-times method.
Mike Dunlavey
Depending on your architecture, this is a perfectly valid and accurate method. Not all processors have caches, branch prediction or multitasking. Although, I'd note that the number of cycles per instruction may be variable, even depending on the the arguments...
Nathan Ernst
+3  A: 

No, you are probably getting an accurate result, QueryPerformanceCounter() works well for timing short intervals. What's wrong is the your expectation of the accuracy of Sleep(). It has a resolution of 1 millisecond, its accuracy is far worse. No better than about 15.625 milliseconds on most Windows machine.

To get it anywhere close to 1 millisecond, you'll have to call timeBeginPeriod(1) first. That probably will improve the match, ignoring the jitter you'll get from Windows being a multi-tasking operating system.

Hans Passant
or use select with a bogus fd to get more a more accurate "sleep"
PiNoYBoY82
A: 

To get finer resolution than 1 ms, you will have to consult your OS documentation. There may be APIs to get timer resolution in microsecond resolution. If so, run your application many times and take the averages.

Thomas Matthews
There is. It's called `QueryPerformanceCounter`. Exactly as mentioned by the OP.
Alan
A: 

Get a proper profiler.

DeadMG
A: 

Or you can use gettimeofday() which gives you a timeval struct that is a timestamp (down to µs)

Moons
http://blogs.msdn.com/b/oldnewthing/archive/2005/09/02/459952.aspx
dan04
A: 

I like Matti Virkkunen's answer. Check the time, call the function a large number of times, check the time when you finish, and divide by the number of times you called the function. He did mention you might be off due to OS interrupts. You might vary the number of times you make the call and see a difference. Can you raise the priority of the process? Can you get it so all the calls within a single OS time slice?

Since you don't know when the OS might swap you out, you can put this all inside a larger loop to do the whole measurement a large number of times, and save the smallest number as that is the one that had the fewest OS interrupts. This still may be greater than the actual time for the function to execute because it may still contain some OS interrupts.

Jim Tshr
A: 

Sanjeet,

It looks (to me) like you're doing this exactly right. QueryPerformanceCounter is a perfectly good way to measure short periods of time with a high degree of precision. If you're not seeing the result you expected, it's most likely because the sleep isn't sleeping for the amount of time you expected it to! However, it is likely being measured correctly.

I want to go back to your original question about how to measure the time on windows with microsecond precision. As you already know, the high performance counter (i.e. QueryPerformanceCounter) "ticks" at the frequency reported by QueryPerformanceFrequency. That means that you can measure time with precision equal to:

1/frequency seconds

On my machine, QueryPerformanceFrequency reports 2337910 (counts/sec). That means that my computer's QPC can measure with precision 4.277e-7 seconds, or 0.427732 microseconds. That means that the smallest bit of time I can measure is 0.427732 microseconds. This, of course, gives you the precision that you originally asked for :) Your machine's frequency should be similar, but you can always do the math and check it.

Matt