views:

89

answers:

5

I am taking a course on computational geometry in the fall, where we will be implementing some algorithms in C or C++ and benchmarking them. Most of the students generate a few datasets and measure their programs with the time command, but I would like to be a bit more thorough.

I am thinking about writing a program to automatically generate different datasets, run my program with them and use R to test hypotheses and estimate parameters.

So... How do you measure program running time more accurately?

What might be relevant to measure?

What hypotheses might be interesting to test (variance, effects caused by caching, etc.)?

Should I test my code on more than one machine? How should these machines differ?

My overall goals are to learn how these algorithms perform in practice, which implementation techniques are better and how the hardware actually performs.

A: 

You could use the Windows API timing function (are not that exactly) and you can use the RDTSC inline assembler command which is sub-nanosecond exact(don't forget that the command and the instructions around it create a small overhead of some hundreds cycles but this is not an big issue).

Quonux
+1  A: 

In order to get better accuracy with program metrics, you will have to run your program many times, such as 100 or 1000.

For more details, on metrics, search the web for metrics and profiling.

Beware that programs may differ in performance (time) measurements due to things running in the background such as virus scanners, music players, and other programs with timers in them.

You could test your program on different machines. Processor clock rates, L1 and L2 cache sizes, RAM sizes, and Disk speeds are all factors (as well as the number of other programs / tasks running concurrently). Floating point may also be a factor.

If you want, you can challenge your compiler by printing the assembly language of the listings for various optimization settings. See which setting produces the fewest or most efficient assembly code.

Since your processing data, look at data driven design: http://www.gamearchitect.net/Articles/DataDrivenDesign.html

Thomas Matthews
Another site (data driven programming): http://ai.eecs.umich.edu/soar/Classes/494/talks/Schumaker.pdf
Thomas Matthews
A: 

You can use the Windows High Performance Counter to get nanosecond accuracy. Technically, afaik, the HPC can be any speed, but you can query it's counts per second, and as far as I know, most CPUs do very very high performance counting.

What you should do is just get a professional profiler. That's what they're for. More realistically, however.

If you're only comparing between algorithms, as long as your machine doesn't happen to excel in one area (Pentium D, SSD sort of thing) it shouldn't matter too much to do it on just one machine. If you want to look at cache effects, try running the algorithm right after the machine starts up (make sure that you get a copy of Windows 7, should be free for CS students), then leave it doing something that can be plenty cache heavy, like image processing, for 24h or something to convince the OS to cache it. Then run algorithm again. Compare.

DeadMG
The claim that the Architecture/machine don't matter is wrong, every architecture is different because of the improvements of the technology with time.The claim that the bench needs a day that the cache "fills up" is also false, the L1/L2 caches fill in a few clocks/microsecounds, so there is no need to "let it run a decade".
Quonux
You totally misunderstood my post. Every architecture is different, but the algorithms won't be. If the ratio of it's performances are roughly the same, the relative performances of the algorithms will stay the same. In addition, filling the L1/L2 caches is about convincing the OS to do it, not asking the CPU to do it.
DeadMG
The OS is responsibly for filling it's own file caches - in main memory. The L1/L2 caches are part of the CPU and are managed by it.
Jørgen Fogh
A: 

You didn't specify your platform. If you are on a POSIX system (eg linux) have a look into clock_gettime. This lets you access different kinds of clocks e.g wall clock time or cpu time. You also may get to know about the precision of the clocks.

Since you are willing to do good statistics on your numbers, you should repeat your experiments often enough such that the statistical test give you enough confidence.

If your measurements are not too fine grained and your variance is low this often is quite good for 10 probes or so. But if you go down to small scale, a short function or so, you might need to go much higher.

Also you would have to ensure reproducible experimental conditions, no other load on the machine, enough memory available etc.

Jens Gustedt
+1  A: 

Profilers are great. Valgrind is pretty popular. Also, I'd suggest trying your code out on risc machines if you can get access to some. Their performance characteristics are different from those of cisc machines in interesting ways.

Nathon