ansaurus

Question

Best way to test code speed in C++ without profiler, or does it not make sense to try?

Answer 1

+1 A:

Is there something you have against profilers? They help a ton. Since you are on WinXP, you should really give a trial of vtune a try. Try a call graph sampling test and look at self time and total time of the functions being called. There's no better way to tune your program so that it's the fastest possible without being an assembly genius (and a truly exceptional one).

Some people just seem to be allergic to profilers. I used to be one of those and thought I knew best about where my hotspots were. I was often correct about obvious algorithmic inefficiencies, but practically always incorrect about more micro-optimization cases. Just rewriting a function without changing any of the logic (ex: reordering things, putting exceptional case code in a separate, non-inlined function, etc) can make functions a dozen times faster and even the best disassembly experts usually can't predict that without the profiler.

As for relying on simplistic timing tests alone, they are extremely problematic. That current test is not so bad but it's a very common mistake to write timing tests in ways in which the optimizer will optimize out dead code and end up testing the time it takes to do essentially a nop or even nothing at all. You should have some knowledge to interpret the disassembly to make sure the compiler isn't doing this.

Also timing tests like this have a tendency to bias the results significantly since a lot of them just involve running your code over and over in the same loop, which tends to simply test the effect of your code when all the memory in the cache with all the branch prediction working perfectly for it. It's often just showing you best case scenarios without showing you the average, real-world case.

Depending on real world timing tests is a little bit better; something closer to what your application will be doing at a high level. It won't give you specifics about what is taking what amount of time, but that's precisely what the profiler is meant to do.

2010-06-27 17:24:06

I used profilers before to optimize performance of entire programs, but given curiosity about a few simple fuctions, calling a few timers is definitely less hassle than choosing, downloading, installing, reading the manual and get working with a profiler. Altogether, there is a difference between understanding underlying stuff like this and getting your software working at reasonable performance. For the latter I would happily use a profiler and the speed of std::reverse would most likely not worry me at all, unless if I was reversing gigabytes...

ufotds 2010-06-27 17:31:12

If you are just going for acceptable performance and not exceptional performance, then a timing test might do. However, it's important to keep in mind that while a profiler might take a little time to learn, it's just something you have to do once. In vtune, just use the call graph sampling wizard, select your exe file, and run it. The only tricky part is that you need to modify your projects settings (http://software.intel.com/en-us/articles/performance-tools-for-software-developers-using-the-intel-compilers-with-vtune-analyzer-or-intel-thread-profiler/). After that just run and see graph.

2010-06-27 17:42:23

... self time will then tell you how much time the cpu spends in a given function/class method excluding calls to other functions/methods, and total time will give you the total amount of time spent in a function/method including the time spent to call other functions/methods. It's just like a timing test except you get the time spent for every function called in your test including the total time spent in main.

2010-06-27 17:44:28

Can you recommend a good, *free* profiler for Windows? Something like gprof under unix?

zvrba 2010-06-27 20:28:49

CodeAnalyst is free if you have an AMD machine. If you have intel, you can still use it but only for TBS (time-based sampling).

2010-06-28 10:56:54

@zvrba: Please take a look at this: http://stackoverflow.com/questions/1777556/alternatives-to-gprof/1779343#1779343

Mike Dunlavey 2010-06-29 13:04:28

Answer 2

+1 A:

I would suppose that anyone competent enough to answer all your questions is gong to be far too busy to answer all your questions. In practice it is probably more effective to ask a single, well-defined questions. That way you may hope to get well-defined answers which you can collect and be on your way to wisdom.

So, anyway, perhaps I can answer your question about which clock to use on Windows.

clock() is not considered a high precision clock. If you look at the value of CLOCKS_PER_SEC you will see it has a resolution of 1 millisecond. This is only adequate if you are timing very long routines, or a loop with 10000's of iterations. As you point out, if you try and repeat a simple method 10000's of times in order to get a time that can be measured with clock() the compiler is liable to step in and optimize the whole thing away.

So, really, the only clock to use is QueryPerformanceCounter()

ravenspoint 2010-06-27 17:24:26

Answer 3

+2 A:

Obviously we would like to measure the cpu time of our code and not the real time, but as far as I understand, these functions don't give that functionality, so other processes on the system would interfere with measurements.

I do two things, to ensure that wall-clock time and CPU time are approximately the same thing:

Test for a significant length of time, i.e. several seconds (e.g. by testing a loop of however many thousands of iterations)
Test when the machine is more or less relatively idle except for whatever I'm testing.

Alternatively if you want to measure only/more exactly the CPU time per thread, that's available as a performance counter (see e.g. perfmon.exe).

What can we know for certain without debugging, dissassembling and profiling tools?

Nearly nothing (except that I/O tends to be relatively slow).

ChrisW 2010-06-27 17:37:30

perfmon, yes thanks for reminding me. I knew it existed, and it is pretty convenient, but do you know if there are system calls we can use to get this information inside our program?

ufotds 2010-06-27 17:44:15

@ufotds - When I did it, long ago, I used hairy calls to read the hidden "performance" section of the registry (the calls were easy, but parsing the binary data which they returned was not). Nowadays it might, I don't know, be abstracted by the "WMI" API.

ChrisW 2010-06-27 18:01:17

Answer 4

+2 A:

To answer you main question, it "reverse" algorithm just swaps elements from the array and not operating on the elements of the array.

tojas 2010-06-27 17:49:08

Answer 5

+2 A:

Is it safe to say you're asking two questions?

Which one is faster, and by how much?
And why is it faster?

For the first, you don't need high precision timers. All you need to do is run them "long enough" and measure with low precision timers. (I'm old-fashioned, my wristwatch has a stop-watch function, and it is entirely good enough.)

For the second, surely you can run the code under a debugger and single-step it at the instruction level. Since the basic operations are so simple, you will be able to easily see roughly how many instructions are required for the basic cycle.

Think simple. Performance is not a hard subject. Usually, people are trying to find problems, for which this is a simple approach.

Mike Dunlavey 2010-06-27 18:02:18

yes, more than 2 even... but for some reason the visual debugger wouldn't let me step into std::reverse, but I had only tried in release mode. Now in debug it works and I can actually see that it does about exactly what I wrote in the update of swapBytes, apart from verifying poiners etc...

ufotds 2010-06-27 18:22:05

Answer 6

+2 A:

Use QueryPerformanceCounter on Windows if you need a high-resolution timing. The counter accuracy depends on the CPU but it can go up to per clock pulse. However, profiling in real world operations is always a better idea.

DeadMG 2010-06-27 18:06:33

It also depends on when it is called. Many CPUs change clock frequency dynamically.

Pavel Radzivilovsky 2010-06-27 22:38:03

Answer 7

A:

Wha? How to measure speed without a profiler? The very act of measuring speed is profiling! The question amounts to, "how can I write my own profiler?" And the answer is clearly, "don't".

Besides, you should be using std::swap in the first place, which complete invalidates this whole pointless pursuit.

-1 for pointlessness.

John 2010-06-27 19:18:49

std::reverse is a wrapper around std::swap...

ufotds 2010-06-27 19:32:14

I didn't downvote, but one thing I've learned on SO is, go easy on people. We all come in with different levels of background, and we can share others' wisdom. Clearly you have wisdom to share. That's a good thing about SO.

Mike Dunlavey 2010-06-28 12:15:17

Mike: Point taken. You are more patient than I am. That aside, do YOU think this question is valid? I'm quickly learning that sensible questions are rare on here. The optimization questions alone make worry about what applications these folks are programming. I hope my bank isn't employing programmers to wonder whether they should roll their own std::swap! :)

John 2010-06-28 19:45:55

ufotds: So what? Substitute "std::reverse" for "std::swap" in my post. The OP should be using std::whatever instead of worrying about rolling his own, worrying about whether is own is faster, and -- the part that just makes no sense -- avoiding using a profiler.

John 2010-06-28 19:48:36

@John: I am amazed that applications work as well as they do, given the silly ideas programmers have on many subjects, including performance.

Mike Dunlavey 2010-06-29 12:53:40

Answer 8

+2 A:

(This answer is specific to Windows XP and the 32-bit VC++ compiler.)

The easiest thing for timing little bits of code is the time-stamp counter of the CPU. This is a 64-bit value, a count of the number of CPU cycles run so far, which is about as fine a resolution as you're going to get. The actual numbers you get aren't especially useful as they stand, but if you average out several runs of various competing approaches then you can compare them that way. The results are a bit noisy, but still valid for comparison purposes.

To read the time-stamp counter, use code like the following:

LARGE_INTEGER tsc;
__asm {
    cpuid
    rdtsc
    mov tsc.LowPart,eax
    mov tsc.HighPart,edx
}

(The cpuid instruction is there to ensure that there aren't any incomplete instructions waiting to complete.)

There are four things worth noting about this approach.

Firstly, because of the inline assembly language, it won't work as-is on MS's x64 compiler. (You'll have to create a .ASM file with a function in it. An exercise for the reader; I don't know the details.)

Secondly, to avoid problems with cycle counters not being in sync across different cores/threads/what have you, you may find it necessary to set your process's affinity so that it only runs on one specific execution unit. (Then again... you may not.)

Thirdly, you'll definitely want to check the generated assembly language to ensure that the compiler is generating roughly the code you expect. Watch out for bits of code being removed, functions being inlined, that sort of thing.

Finally, the results are rather noisy. The cycle counters count cycles spent on everything, including waiting for caches, time spent on running other processes, time spent in the OS itself, etc. Unfortunately, it's not possible (under Windows, at least) to time just your process. So, I suggest running the code under test a lot of times (several tens of thousands) and working out the average. This isn't very cunning, but it seems to have produced useful results for me at any rate.

brone 2010-06-27 20:03:53

hi, thanks for this snippet. I doubt it's practical value for this purpose, since apparently with WMI it *is* possible to measure only your process, but I pasted it in a simple c++ program and it does work as-is. On top of that, it's the first time I used inline assembler, because my assembler knowledge is rather bleak...

ufotds 2010-06-27 21:08:22

ansaurus

tags:

views:

answers:

Best way to test code speed in C++ without profiler, or does it not make sense to try?

related questions