views:

457

answers:

7

I'm working on a project which is in serious need of some performance tuning.

How do I write a test that fails if my optimizations do not in improve the speed of the program?

To elaborate a bit:

The problem is not discovering which parts to optimize. I can use various profiling and benchmarking tools for that.

The problem is using automated tests to document that a specific optimization did indeed have the intended effect. It would also be highly desirable if I could use the test suite to discover possible performance regressions later on.

I suppose I could just run my profiling tools to get some values and then assert that my optimized code produces better values. The obvious problem with that, however, is that benchmarking values are not hard values. They vary with the local environment.

So, is the answer to always use the same machine to do this kind of integration testing? If so, you would still have to allow for some fuzziness in the results, since even on the same hardware benchmarking results can vary. How then to take this into account?

Or maybe the answer is to keep older versions of the program and compare results before and after? This would be my preferred method, since it's mostly environment agnostic. Does anyone have experience with this approach? I imagine it would only be necessary to keep one older version if all the tests can be made to pass if the performance of the latest version is at least as good as the former version.

+2  A: 

First you need to establish some criteria for acceptable performance, then you need to devise a test that will fail that criteria when using the existing code, then you need to tweak your code for performance until it passes the test. You will probably have more than one criteria for performance, and you should certainly have more than one test.

Ed Guiness
+4  A: 

Record the running time of the current code.

if (newCode.RunningTime >= oldCode.RunningTime) Fail
Rik
+1  A: 

In many server applications (might not be your case) performance problem manifest only under concurrent access and under load. Measuring absolute time a routine executes and trying to improve it is therefore not very helpful. There are problems with this method even in single-threaded applications. Measuring absolute routine time relies on the clock the platform is providing, and these are not always very precise; you better rely on average time a routine takes.

My advice is:

  • Use profiling to identify routines that execute the most times and take most time.
  • Use tool like JMeter or Grinder to elaborate representative test cases, simulate concurrent access, put your application under stress and measure (more importantly) throughput and average response time. This will give you a better idea of how your application is behaving as seen from the outside perspective.

While you could use unit tests to establish some non functional aspects of your application, I think that the approach given above will give better results during optimization process. When placing time-related assertions in your unit tests you will have to choose some very approximative values: time can vary depending on the environment you are using to run your unit tests. You don't want tests to fail only because some of your colleagues are using inferior hardware.

Tuning is all about finding right things to tune. You already have a functioning code, so placing performance related assertions a posteriori and without establishing critical sections of code might lead you to waste a lot of time on optimizing non-essential pieces of your application.

Dan
"time can vary depending on the environment you are using to run your unit tests. You don't want tests to fail only because some of your colleagues are using inferior hardware."That's excactly right, and partly what is causing my headache.
KaptajnKold
A: 

Run the tests + profiling in CI server. You can also run load tests periodically.

You are concerned about differences (as you mentioned), so its not about defining an absolute value. Have an extra step that compares the performance measures of this run with the one of the last build, and report on the differences as %. You can raise a red flag for important variations of time.

If you are concerned on performance, you should have clear goals you want to meet and assert them. You should measure those with tests on the full system. Even if your application logic is fast, you might have issues with the view causing you to miss the goal. You can also combine it with the differences approach, but for these you would have less tolerance to time variations.

Note that you can run the same process in your dev computer, just using only the previous runs in that computer and not a shared one between developers.

eglasius
+2  A: 

I suspect that applying TDD to drive performance is a mistake. By all means, use it to get to good design and working code, and use the tests written in the course of TDD to ensure continued correctness - but once you have well-factored code and a solid suite of tests, you are in good shape to tune, and different (from TDD) techniques and tools apply.

TDD gives you good design, reliable code, and a test coverage safety net. That puts you into a good place for tuning, but I think that because of the problems you and others have cited, it's simply not going to take you much further down the tuning road. I say that as a great fan and proponent of TDD and a practitioner.

Carl Manaster
+1 agreed. TDD ensures that when you do tune the system, you don't break the functionality
MatthieuF
A: 

For the tuning itself, you can compare the old code and new code directly. But don't keep both copies around. This sounds like a nightmare to manage. Also, you're only ever comparing one version with another version. It's possible that a change in functionality will slow down your function, and that is acceptable to the users.

Personally, I've never seen performance criteria of the type 'must be faster than the last version', because it is so hard to measure.

You say 'in serious need of performance tuning'. Where? Which queries? Which functions? Who says, the business, the users? What is acceptable performance? 3 seconds? 2 seconds? 50 milliseconds?

The starting point for any performance analysis is to define the pass/fail criteria. Once you have this, you CAN automate the performance tests.

For reliability, you can use a (simple) statistical approach. For example, run the same query under the same conditions 100 times. If 95% of them return in under n seconds, that is a pass.

Personally, I would do this at integration time, from either a standard machine, or the integration server itself. Record the values for each test somewhere (cruise control has some nice features for this sort of thing). If you do this, you can see how performance progresses over time, and with each build. You can even make a graph. Managers like graphs.

Having a stable environment is always hard to do when doing performance testing, whether or not you're doing automated tests or not. You'll have that particular problem no matter how you develop (TDD, Waterfall, etc).

MatthieuF
"I've never seen performance criteria of the type 'must be faster than the last version'"The WebKit folks have a zero-tolerance policy for performance regressions.
KaptajnKold
@KatajnKold Thats cool. That means I've seen it once :-)
MatthieuF
A: 

Not faced this situation yet ;) however if I did, here's how I'd go about it. (I think I picked this up from Dave Astel's book)

Step#1: Come up with a spec for 'acceptable performance' so for example, this could mean 'The user needs to be able to do Y in N secs (or millisecs)'
Step#2: Now write a failing test.. Use your friendly timer class (e.g. .NET has the StopWatch class) and Assert.Less(actualTime, MySpec)
Step#3: If the test already passes, you're done. If red, you need to optimize and make it green. As soon as the test goes green, the performance is now 'acceptable'.

Gishu