I am taking a course on computational geometry in the fall, where we will be implementing some algorithms in C or C++ and benchmarking them. Most of the students generate a few datasets and measure their programs with the time
command, but I would like to be a bit more thorough.
I am thinking about writing a program to automatically generate different datasets, run my program with them and use R to test hypotheses and estimate parameters.
So... How do you measure program running time more accurately?
What might be relevant to measure?
What hypotheses might be interesting to test (variance, effects caused by caching, etc.)?
Should I test my code on more than one machine? How should these machines differ?
My overall goals are to learn how these algorithms perform in practice, which implementation techniques are better and how the hardware actually performs.