views:

724

answers:

6

Hi,

Can clock() be used as a dependable API to measure time taken by CPU to execute a snippet of code? When verified usng times() / clock(), both do not seem to measure the CPU time taken precisely.

Firstly, can the APIs clock()/times() be used to measure the execution time of a function/snippet of code, as given in the example below? Is there a better and dependable alternative? The mechanism is to work on Linux, HP-UX, IBM-AIX and Sun Solaris as we need to measure (&& compare) the performance of a piece of code on all these platforms.

Kindly suggest. Also, please let me know if am missing anything trivial.

bbb@m_001:/tmp/kk1$ ./perf_clock 102400
{clock(): S          0 E          0 D    0.0000000000}
bbb@m_001:/tmp/kk1$ ./perf_clock 204800
{clock(): S          0 E      10000 D    0.0100000000}
bbb@m_001:/tmp/kk1$ cat perf_clock.c

#include <stdio.h>
#include <string.h>
#include <time.h>
#include <unistd.h>

void test_clock(char* sbuf, int* len){
    clock_t start, end; int i=0;
    start = clock();
    while(i++ < 500) memset((char*)sbuf,0,*len);
    end = clock();
    printf("{clock(): S %10lu E %10lu D %15.10f}\n",
        start,end,(end-start)/(double) CLOCKS_PER_SEC);
}
int main(int argc,char* argv[])
{
        int len=atoi(argv[1]);
        char *sbuf=(char*)malloc(len);
        test_clock(sbuf,&len);
        free(sbuf); return 0;
}

The results seem to say that to memset() a 100 KB chunk, 500 times, there is no time spent. Or does it say that it is not measurable in microseconds?

Actually, it is not memset() but another function[that prepares a huge structure sized around 1MB, mallocs a copy of this structure, does an Oracle DB select and populate these structures with the data from DB] which am trying to measure. Even this shows 0 ticks, and that is what has kept me confused.

Thanks!

+1  A: 

you can use clock_t to get the number of CPU ticks since the program started.

Or you can use the linux time command. eg: time [program] [arguments]

zipcodeman
>>use clock_t to get the number of CPU ticks since the program startedThen should one interpret the result that 'it took 0 CPU ticks to memset() a 100 KB chunk, 500 times'? Am a bit confused. :(>>time commandIt would measure time taken by the program as a whole and not a particular function/piece of code, isn't it?
raghava
The time calculates User, Real, and system time. Those measurements could be useful.but yes, it works on the program as a whole.
zipcodeman
+2  A: 

I would try to correlate the time with the shell's time command, as a sanity check.

You should also consider that the compiler may be optimizing the loop. Since the memset does not depend on the loop variable the compiler will certainly be tempted to apply an optimization known as loop invariant code motion.

I would also caution that a 10MB possibly in-cache clear will really be 1.25 or 2.5 million CPU operations as memset certainly writes in 4-byte or 8-byte quantities. While I rather doubt that this could be done in less than a microsecond, as stores are a bit expensive and 100K adds some L1 cache pressure, you are talking about not much more than one operation per nanosecond, which is not that hard to sustain for a multi-GHz CPU.

One imagines that 600 nS would round off to 1 clock tick, but I would worry about that as well.

DigitalRoss
`memset` should be using 16-byte stores on most current architectures.
Stephen Canon
Thank you, DigitalRoss. But the above snippet is just an example. I am actually trying to measure a function which prepares a huge structure (of size around 1MB), mallocs a copy of this structure, does an Oracle db select and populate these structures with the data from DB. I noticed that even that is shown to take 0 ticks, as measured using clock(). A function doing i/o over network and still shown to take 0 ticks is what having me in confusion. :(
raghava
+4  A: 

I would give a try with getrusage and check system and user time.

Also check with gettimeofday to compare with wall clock time.

Andy
Thanks! getrusage() sounds promising. But gettimeofday(), am doubtful. I need to measure CPU time taken only for my process. With varying system load being a thing to consider, measuring wall-clock time does not seem to be a proper way. Please correct me if am wrong.
raghava
You're correct in that wall clock is not appropriate for measuring your application, however sometimes it can be a quick and dirty measure that might be useful.
McPherrinM
I suggested to use gettimeofday for statistics (just for comparison).
Andy
+4  A: 

On recent Linux's (*). you can get this information from the /proc filesystem. In the file /proc/PID/stat the 14th entry has the number of jiffies used in userland code and the 15th entry has the number of jiffies used in system code.

If you want to see the data on a per-thread basis, you should reference the file /proc/PID/task/TID/stat instead.

To convert jiffies to microseconds, you can use the following:

define BLTPR_USEC_PER_SEC         1000000UL

long long jiffies_to_microsecond(long long jiffies)
{
    long hz = sysconf(_SC_CLK_TCK);
    if (hz <= USEC_PER_SEC && !(USEC_PER_SEC % hz))
    {
        return (USEC_PER_SEC / hz) * jiffies;
    }
    else if (hz > USEC_PER_SEC && !(hz % USEC_PER_SEC))
    {
        return (jiffies + (hz / USEC_PER_SEC) - 1) / (hz / USEC_PER_SEC);
    }
    else
    {
        return (jiffies * USEC_PER_SEC) / hz;
    }
}

If all you care about is the per-process statistics, getrusage is easier. But if you want to be prepared to do this on a per-thread basis, this technique is better as other then the file name, the code would be identical for getting the data per-process or per-thread.

* - I'm not sure exactly when the stat file was introduced. You will need to verify your system has it.

R Samuel Klatchko
A: 

Some info here on HP's page about high resolution timers. Also, same trick _Asm_mov_from_ar (_AREG_ITC); used in http://www.fftw.org/cycle.h too.

Have to confirm if this can really be the solution.

Sample prog, as tested on HP-UX 11.31:

bbb@m_001/tmp/prof > ./perf_ticks 1024
ticks-memset {func [1401.000000] inline [30.000000]} noop [9.000000]
bbb@m_001/tmp/prof > cat perf_ticks.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#include <unistd.h>
#include "cycle.h" /* one from http://www.fftw.org/cycle.h */
void test_ticks(char* sbuf, int* len){
    memset((char*)sbuf,0,*len);
}
int main(int argc,char* argv[]){
        int len=atoi(argv[1]);
        char *sbuf=(char*)malloc(len);
        ticks t1,t2,t3,t4,t5,t6;
        t1 =getticks(); test_ticks(sbuf,&len); t2 =getticks();
        t3 =getticks(); memset((char*)sbuf,0,len); t4 =getticks();
        t5=getticks();;t6=getticks();
        printf("ticks-memset {func [%llf] inline [%llf]} noop [%llf]\n",
                          elapsed(t2,t1),elapsed(t4,t3),elapsed(t6,t5));
        free(sbuf); return 0;
}
bbb@m_001/tmp/prof >
raghava
A: 

Resource usage of a process/thread is updated by the OS only periodically. It's entirely possible for a code snippet to complete before the next update thus producing zero resource usage diffs. Can't say anything about HP or AIX, would refer you to Solaris Performance and Tools book for Sun. For Linux you want to look at oprofile and newer perf tool. On the profiling side valgrind would be of much help.

Nikolai N Fetissov