ansaurus

Question

How can I test the performance of a C function?

Answer 1

+2 A:

Store off the system time before you enter the fuction. Store off the system time after you return from the function. Subtract the difference and compare the two implementations.

Jeff Lamb 2009-11-06 15:00:20

Thanks Jeff, that seems like a good approach.

Fred 2009-11-06 15:02:25

And of course, loop over it enough times that you don't get a difference of zero.

Jefromi 2009-11-06 15:02:51

Answer 2

A:

Store timestamp before entering function
Store timestamp after exiting function
Compare timestamps

Make sure to use a significant sample as the time resolution might vary your results. This is especially true for short duration functions. Use high-resolution timers (microseconds resolution is available on most platforms).

jldupont 2009-11-06 15:02:11

Answer 3

+3 A:

Run it (them) several million times (each) and measure the time it takes.
The one that completes faster is the better performant.

gprof can help :)

Here's the result of gprof when I run a program of mine for 10 seconds (function names changed)

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 60.29      8.68     8.68 115471546     0.00     0.00  workalot
 39.22     14.32     5.64       46   122.70   311.32  work_b
  0.49     14.39     0.07                             inlined
  0.07     14.40     0.01       46     0.22     0.22  work_c
  0.00     14.40     0.00      460     0.00     0.00  find_minimum
  0.00     14.40     0.00      460     0.00     0.00  feedback
  0.00     14.40     0.00       46     0.00     0.00  work_a

pmg 2009-11-06 15:02:12

I'd agree with this *generally*. However, the first iteration is liable to be a lot slower than the rest, due to caching issues. If the routine is typically done only once, rather than in a tight loop, this will give you a skewed picture. OTOH, if the routine is only done once, you shouldn't be wasting valuable time trying to profile or optimizing it either.

T.E.D. 2009-11-06 15:21:16

Thanks pmg, I will check out gprof. I noticed that I even had it installed by default.

Fred 2009-11-06 15:25:46

T.E.D. makes a couple of excellent points. The CPU cache and caching by the OS will vastly improve the performance of your function on all but the first iteration, giving you an average performance far exeeding what you get if the function is run alone or in-between other functions meaty enough to replace the contents of the CPU cache. But this is probably the best simple profiling technique there is, and will still give you a ballpark good/acceptable/terrible performance figure.

Dogmang 2009-11-06 15:48:02

If you mix up the calls, you'll tend to avoid the effect T.E.D. noticed. On the other hand, caching will affect all functions, and the effect might even out.

David Thornley 2009-11-06 16:40:17

Yes, but wouldn't functions with many subsequent calls be the ones where performance is most important. Comparing a tight loop whith a function call in each iteration, to one single call. The performance of a single call is less likely to matter the way I see it.

Fred 2009-11-06 17:33:20

Answer 4

+7 A:

You need high-resolution timers.

On Linux, gettimeofday() is a decent choice, it gives you microsecond resolution. On Windows, QueryPerformanceCounter() is typical. Make sure you run your function many times, to get stable readings.

Quick sample, for Linux:

struct timeval t0, t1;
unsigned int i;

gettimeofday(&t0, NULL);
for(i = 0; i < 100000; i++)
  function_to_measure();
gettimeofday(&t1, NULL);
printf("Did %u calls in %.2g seconds\n", i, t1.tv_sec - t0.tv_sec + 1E-6 * (t1.tv_usec - t0.tv_usec);

You would of course adjust the count (100,000) to match the performance of the function. It's best if the function really takes a while to run, otherwise the loop and/or the function call overhead might dominate.

unwind 2009-11-06 15:02:28

Thanks for the tip and example. I run mac os here so gettimeofday() is available here as well.

Fred 2009-11-06 15:21:47

This works ok if the function depends only on memory and cpu and does not change states (i.e. runs the same every time). If your function has file access, you may be fooled by the filesystem caching.

Adriaan 2009-11-06 15:46:13

Answer 5

+1 A:

Checkout HighResTimer for a high performance timer.

You'll probably find storing the time before/after isn't accurate enough and will probably result in 0 unless you have a longer running function.

Ian 2009-11-06 15:03:15

Answer 6

A:

Check out RDTSC but it is better do it like below.

0 - Call system's Sleep or Yield function so that when it returns, you have a new timeslice

1 - RDTSC

2 - Call your function

3 - RDTSC

If your function is a long running one, you have to use some sort of profiling tool like gprof (it is very easy to use) & Intel's VTune application (which I have not used for a long time). After seeing Art's answer, I changed my mind from gprof to Callgrind. I used only the Memcheck tool of Valgrind in the past and it was a magnificent tool. I have not used Callgrind before but I am sure it is better than gprof...

Malkocoglu 2009-11-06 15:07:16

That interesting, i did not know that there were assebly instructions for this. Might have to try this as well to see how it works.

Fred 2009-11-06 15:57:44

Answer 7

+4 A:

The open-source Callgrind profiler (for Linux) is a really awesome way to measure performance. Coupled with KCacheGrind, you get really great visualizations of where your time is spent.

Callgrind is part of Valgrind.

Art

Xuggle 2009-11-06 15:12:38

Answer 8

A:

As the simplest and portable approach you can use the standard function time(), which returns the current number of seconds since the Epoch.


#include <time.h>

time_t starttime, endtime;

starttime = time(NULL);
for (i = 0; i < 1000000; i++)
{
    testfunc();
}
endtime = time(NULL);

printf("Time in seconds is %d\n", (int)(endtime-starttime));

Adjust the number of iterations to your needs. If one function call needs 5 seconds then you need a laaarge cup of coffee for 1000000 iterations... When the difference is less than 1 second, even for a large number, you should 1) ask yourself if it matters and if yes, 2) check if your favorite compiler already has builtin profiling functions.

Secure 2009-11-06 15:34:35

Answer 9

+1 A:

Fred, I notice that you said in a comment that you're on OS X. The best way to get very accurate timings of small-scale functions on OS X is with the mach_absoute_time( ) function. You can use it as follows:

#include <mach/mach_time.h>
#include <stdint.h>

int loopCount;

uint64_t startTime = mach_absolute_time( );
for (loopCount = 0; loopCount < iterations; ++loopCount) {
    functionBeingTimed( );
}
uint64_t endTime = mach_absolute_time( );
double averageTime = (double)(endTime-startTime) / iterations;

This gives you the average timing across iterations calls to the function. This can be affected somewhat by effects outside of your process on the system. Thus, you may instead want to take the fastest time:

#include <mach/mach_time.h>
#include <stdint.h>

int loopCount;

double bestTime = __builtin_inf();
for (loopCount = 0; loopCount < iterations; ++loopCount) {
    uint64_t startTime = mach_absolute_time( );
    functionBeingTimed( );
    uint64_t endTime = mach_absolute_time( );
    double bestTime = __builtin_fmin(bestTime, (double)(endTime-startTime));
}

This can have its own problems, especially if the function being timed is very very fast. You need to think about what you are really trying to measure and pick an approach that is scientifically justified (good experimental design is hard). I often use a hybrid between these two approaches as a first attempt at measuring a novel task (a minimum of averages over many calls).

Note also that in the code samples above, the timings are in "mach time units". If you just want to compare algorithms, this is usually fine. For some other purposes, you may want to convert them to nanoseconds or cycles. To do this, you can use the following functions:

#include <mach/mach_time.h>
#include <sys/sysctl.h>
#include <stdint.h>

double ticksToNanoseconds(double ticks) {
    static double nanosecondsPerTick = 0.0;
    // The first time the function is called
    // ask the system how to convert mach
    // time units to nanoseconds
    if (0.0 == nanosecondsPerTick) {
        mach_timebase_info_data_t timebase;
        // to be completely pedantic, check the return code of this call:
        mach_timebase_info(&timebase);
        nanosecondsPerTick = (double)timebase.numer / timebase.denom;
    }
    return ticks * nanosecondsPerTick;
}

double nanosecondsToCycles(double nanoseconds) {
    static double cyclesPerNanosecond = 0.0;
    // The first time the function is called
    // ask the system what the CPU frequency is
    if (0.0 == cyclesPerNanosecond) {
        uint64_t freq;
        size_t freqSize = sizeof(freq);
        // Again, check the return code for correctness =)
        sysctlbyname("hw.cpufrequency", &freq, &freqSize, NULL, 0L );
        cyclesPerNanosecond = (double)freq * 1e-9;
    }
    return nanoseconds * cyclesPerNanosecond;
}

Be aware that the conversion to nanoseconds will always be sound, but the conversion to cycles can go awry in various ways, because modern CPUs do not run at one fixed speed. Nonetheless, it generally works pretty well.

Stephen Canon 2009-11-06 16:26:59

Thanks Stephen, excelent! I will try this out.

Fred 2009-11-06 17:34:34

If you hit any problems, let me know; I typed all this from memory, so I might have made an error somewhere =)

Stephen Canon 2009-11-06 17:40:03

Answer 10

A:

All of these other answers are using some variant of gettimeofday() for timing. This is pretty crude since you usually need to run the kernel many times to get reproducible results. Putting it in a tight loop changes the state of both code and data caches so these results may not be indicative of the real performance.

A much better alternative is to actually use the CPU cycle counter. On x86, you can do this with the rdtsc instruction. This is from x264:

static inline uint32_t read_time(void)
{
    uint32_t a = 0;
#if defined(__GNUC__) && (defined(ARCH_X86) || defined(ARCH_X86_64))
    asm volatile( "rdtsc" :"=a"(a) ::"edx" );
#elif defined(ARCH_PPC)
    asm volatile( "mftb %0" : "=r" (a) );
#elif defined(ARCH_ARM)     // ARMv7 only
    asm volatile( "mrc p15, 0, %0, c9, c13, 0" : "=r"(a) );
#endif
    return a;
}

For more on profiling using various hardware counters, see PAPI. For some purposes, simulators (like Callgrind and interrupt-based profilers (Oprofile) are useful.

Jed 2009-11-07 22:22:51

ansaurus

tags:

views:

answers:

How can I test the performance of a C function?

related questions