views:

152

answers:

4

Given that I don't know at deployment time what kinds of system my code will be running on, how do I write a Performance benchmark that uses the potential of a system as its yardstick.

What I mean is that if a system is capable of running the piece of code 1000 times per second, I'd like the test to ensure that is comes under as close to 1000 as possible. If it can only do 500, then that's the rate I'd like to compare it against.

If it helps in making the answer more specific, I'm using JUnit4.

Thank you.

+6  A: 

I would not use unit testing for performance tests for a couple of reasons.

First, unit tests should not have dependencies to the surrounding system/code. Performance tests depend heavily on the hardware/OS, so it is hard to get uniform measures that will be usable on both developer workstations, build server etc.

Second, unit tests should execute really fast. When you do performance tests, you usually want to have quite large data sets and repeat the number of runs a couple of times in order average numbers/get rid of overhead and so forth. These all work against the idea of fast tests.

Brian Rasmussen
Good answer, not to my question, but a good answer none the less.
Allain Lalonde
Please comment when down voting.
Brian Rasmussen
A: 

I do some time measurements on tests for code that is destined for a real time system where a correct answer that took too long to calculate is a failure.

All I do is plot the delta cpu time that the test took over the recent builds. Note, CPU time not real time. The actual value doesn't matter too much - what matters is how much it changed.

If I commit a change to an algorithm which significantly changed the run time for the test I can easily zoom in to the specific changeset that caused it. What I really care about are these points of interest - not necessarily the absolute values. There are quite often many tradeoffs in a realtime system and these can't always be represented to the test framework as a simple compare.

Looking at absolute times and normalizing them at first seems reasonable but in reality the conversion between your system and the target system will be non-linear - for instance cache pressure, swap usage, disk speed on the target system etc may cause the time for the test to explode at a different threshold as your system.

If you absolutely need a test that is accurate in this regard, duplicate the target system and use it as a test slave but in a similiar environment as you expect it to be in.

In my case it may be actually downloading firmware to a DSP, remotely powercycling it, reading the response from a serial port or seeing no response because it crashed!

--jeffk++

jdkoftinoff
+5  A: 

A test means you have a pass/fail threshold. For a performance test, this means too slow and you fail, fast enough and you pass. If you fail, you start doing rework.

If you can't fail, then you're benchmarking, not actually testing.

When you talk about "system is capable of running" you have to define "capable". You could use any of a large number of hardware performance benchmarks. Whetstone, Dhrystone, etc., are popular. Or, perhaps you have a database-intensive application, then you might want to look at the TPC benchmark. Or, perhaps you have a network-intensive application and want to use netperf. Or a GUI-intensive application and want to use some kind of graphics benchmark.

Any of these give you some kind of "capability" measurement. Pick one or more. They're all good. Equally debatable. Equally biased toward your competitor and away from you.

Once you've run the benchmark, you can then run your software and see what the system actually does.

You could -- if you gather enough data -- establish some correlation between some benchmark numbers and your performance numbers. You'll see all kinds of variation based on workload, hardware configuration, OS version, virtual machine, DB server, etc.

With enough data from enough boxes with enough different configurations, you will eventually be able to develop a performance model that says "given this hardware, software, tuning parameters and configuration, I expect my software to do [X] transactions per second." That's a solid definition of "capable".

Once you have that model, you can then compare your software against the capability number. Until you have a very complete model, you don't really know which systems are even capable of running the piece of code 1000 times per second.

S.Lott
+1  A: 

I agree with Brian when he says that unit tests is not the appropriate way to do performance testing. However I put together a short example that could be used as an integration test to run on different system configurations/environments.
Note that is just to give an idea of what could be done in this regard, and does not provide results that are precise enough to back up any official statement about the performance of a system.

import static org.junit.Assert.*;
import org.junit.Test;

package com.stackoverflow.samples.tests {

    @Test
    public void doStuffRuns500TimesPerSecond() {
        long maximumRunningTime = 1000;
        long currentRunningTime = 0;
        int iterations = 0;

        do {
            long startTime = System.getTimeMillis();

            // do stuff

            currentRunningTime += System.getTimeMillis() - startTime;
            iterations++;
        }
        while (currentRunningTime <= maximumRunningTime);

        assertEquals(500, iterations);
    }
}
Enrico Campidoglio