ansaurus

Question

Performance Testing for Calculation-Heavy Programs

Answer 1

+3 A:

It's pretty much the same process as any other optimization: profile, experiment, benchmark, repeat.

First you have to figure out what sections of your code are taking up the time. Then try different methods to speed them up (trying methods based on merit would be a better idea than trying things at random). Benchmark to find out if you actually did speed them up. If you did, replace the old method with the new one. Profile again.

Chad Birch 2009-05-07 19:57:33

Answer 2

A:

Well how you do this depends the most on which language you are using. Still, the key in any language in the profiler. Profile your code. See which functions/operations are taking the most time and then determine if you can make these costly operations more efficient.

Standard bottlenecks in numerical algorithms are memory usage (do you access matrices in the order which the elements are stored in memory); communication overhead, etc. They can be little different than other non-numerical programs.

Moreover, many other factors such as preconditioning, etc. can lead to drastically difference performance behavior of the SAME algorithm on the same problem. Make sure you determine optimal parameters for your implementations.

As for comparing different algorithms, I recommend reading the paper

"Benchmarking optimization software with performance profiles," Jorge Moré and Elizabeth D. Dolan, Mathematical Programming 91 (2002), 201-213.

It provides a nice, uniform way to compare different algorithms being applied to the same problem set. It really should be better known outside of the optimization community (in my not so humble opinion at least).

Good luck!

SplittingField 2009-05-07 20:04:21

Answer 3

+1 A:

I would recommend against a brute force approach if it's at all possible to do it some other way. But, here are some guidelines that should help you speed your code up either way.

There are many, many different optimizations you could apply to your code, but before you do anything, you should profile to figure out where the bottleneck is. Here are some profilers that should give you a good idea about where the hot spots are in your code:

These all use sampling to get their data, so the overhead of running them with your code should be minimal. Only GProf requires that you recompile your code. Also, the last three let you do both time and hardware performance counter profiles, so once you do a time (or CPU cycle) profile, you can zoom in on the hotter regions and find out why they might be running slow (cache misses, FP instruction counts, etc.).

Beyond that, it's a matter of thinking about how best to restructure your code, and this depends on what the problem is. It may be that you've just got a loop that the compiler doesn't optimize well, and you can inline or move things in/out of the loop to help the compiler out. Or, if you're running as fast as you can with basic arithmetic ops, you may want to try to exploit vector instructions (SSE, etc.) If your code is parallel, you might have load balance problems, and you may need to restructure your code so that data is better distributed across cores.

These are just a few examples. Performance optimization is complex, and it might not help you nearly enough if you're doing a brute force approach to begin with.

For more information on ways people have optimized things, there were some pretty good examples in the recent Why do you program in assembly? question.

tgamblin 2009-05-07 20:06:07

Answer 4

+2 A:

If your optimization problem is (quasi-)convex or can be transformed into such a form, there are far more efficient algorithms than evolutionary search.

If you have large matrices, pay attention to your linear algebra routines. The right algorithm can make shave an order of magnitude off the computation time, especially if your matrices are sparse.

Think about how data is loaded into memory. Even when you think you're spending most of your time on pure arithmetic, you're actually spending a lot of time moving things between levels of cache etc. Do as much as you can with the data while it's in the fastest memory.

Try to avoid unnecessary memory allocation and de-allocation. Here's where it can make sense to back away from a purely OO approach.

John D. Cook 2009-05-07 20:06:25

I would go even further and say that for any optimization problem, except for SAT solvers and TSP problems, there are far more efficient algorithms than evolutionary search.

SplittingField 2009-05-07 20:25:10

Answer 5

+1 A:

This is more of a tip to find holes in the algorithm itself...

To realize maximum performance, simplify everything inside the most inner loop at the expense of everything else.

One example of keeping things simple is the classic bouncing ball animation. You can implement gravity by looking up the definition in your physics book and plugging in the numbers, or you can do it like this and save precious clock cycles:

initialize:
float y = 0; // y coordinate
float yi = 0; // incremental variable

loop:
y += yi;
yi += 0.001;

if (y > 10)
  yi = -yi;

But now let's say you're having to do this with nested loops in an N-body simulation where every particle is attracted to every other particle. This can be an enormously processor intensive task when you're dealing with thousands of particles.

You should of course take the same approach as to simplify everything inside the most inner loop. But more than that, at the very simplest level you should also use data types wisely. For example, math operations are faster when working with integers than floating point variables. Also, addition is faster than multiplication, and multiplication is faster than division.

So with all of that in mind, you should be able to simplify the most inner loop using primarily addition and multiplication of integers. And then any scaling down you might need to do can be done afterwards. To take the y and yi example, if yi is an integer that you modify inside the inner loop then you could scale it down after the loop like this:

y += yi * 0.01;

These are very basic low-level performance tips, but they're all things I try to keep in mind whenever I'm working with processor intensive algorithms. Of course, if you then take these ideas and apply them to parallel processing on a GPU then you can take your algorithm to a whole new level. =)

Steve Wortham 2009-05-07 20:29:28

ansaurus

tags:

views:

answers:

Performance Testing for Calculation-Heavy Programs

related questions