ansaurus

Question

How to measure program execution time in ARM Cortex-A8 processor?

Answer 1

+1 A:

You need to profile your code with performance analysis tools before and after your optimizations.

Acct is a command line and a function which you can use to monitor your resources. You can google more on the usage and viewing of the dat file hence generated by acct.

I will update this post with other opensource performance analysis tools.

Gprof is another such tool. Please check the documentation for the same.

Praveen S 2010-07-14 15:02:54

Praveen, the problem I have faced before with performance analysis tools (e.g. gprof) is that when I turn on the optimization flags (-O3) the statistics I get don't make any sense. Its been a while I used gprof for that reason, I will give it a try now let me see.

vikramtheone 2010-07-14 15:06:17

@vikramtheone - Suppose you create a acct file per function call, you can get detailed information of the resources used in terms of time and other parameters. I have used this to profile and compare code optimisations at function level. Alternatively you can access the struct time_t using gettimeofday and get the function execution time at the microseconds level too.So it depends what you want to achieve with it.

Praveen S 2010-07-14 15:17:05

Praveen, I will look into acct. As far as gettimeofday is concerned, I'm using it as of now, but the problem I face is a lot of fluctuations in the time it measures each time, so I think direct time measurement is rather not very appropriate, instead using some other entity which will remain constant no matter how many processes are running is more useful and such an entity is cycle count. At least as of now I think that it will remain constant, don't know what truth awaits.

vikramtheone 2010-07-15 08:15:11

@vikramtheone - Well in that case you can profile the code. You can check about getruusage(). And the profiling tools like Acct and grof will give you a view of execution time scenario. But if you can explain clearly what is the type of inconsistency you are facing, you can get better answers as profiling is a major activity before release.

Praveen S 2010-07-15 09:27:13

Answer 2

A:

I've worked in an toolchain for ARM7 which had an instruction level simulator. Running apps in that could give timings for individual lines and/or asm instruction. That was great for a micro optimization of a given routine. That approach probably isn't appropriate for a whole app/whole system optimization though.

Digikata 2010-07-14 15:09:15

Answer 3

+3 A:

Accessing the performance counters isn't difficult, but you have to enable them from kernel-mode. By default the counters are disabled.

In a nutshell you have to execute the following two lines inside the kernel. Either as a loadable module or just adding the two lines somewhere in the board-init will do:

  /* enable user-mode access to the performance counter*/
  asm ("MCR p15, 0, %0, C9, C14, 0\n\t" :: "r"(1));

  /* disable counter overflow interrupts (just in case)*/
  asm ("MCR p15, 0, %0, C9, C14, 2\n\t" :: "r"(0x8000000f));

Once you did this the cycle counter will start incrementing for each cycle. Overflows of the register will go unnoticed and don't cause any problems (except they might mess up your measurements).

Now you want to access the cycle-counter from the user-mode:

We start with a function that reads the register:

static inline unsigned int get_cyclecount (void)
{
  unsigned int value;
  // Read CCNT Register
  asm volatile ("MRC p15, 0, %0, c9, c13, 0\t\n": "=r"(value));  
  return value;
}

And you most likely want to reset and set the divider as well:

static inline void init_perfcounters (int32_t do_reset, int32_t enable_divider)
{
  // in general enable all counters (including cycle counter)
  int32_t value = 1;

  // peform reset:  
  if (do_reset)
  {
    value |= 2;     // reset all counters to zero.
    value |= 4;     // reset cycle counter to zero.
  } 

  if (enable_divider)
    value |= 8;     // enable "by 64" divider for CCNT.

  value |= 16;

  // program the performance-counter control-register:
  asm volatile ("MCR p15, 0, %0, c9, c12, 0\t\n" :: "r"(value));  

  // enable all counters:  
  asm volatile ("MCR p15, 0, %0, c9, c12, 1\t\n" :: "r"(0x8000000f));  

  // clear overflows:
  asm volatile ("MCR p15, 0, %0, c9, c12, 3\t\n" :: "r"(0x8000000f));
}

do_reset will set the cycle-counter to zero. Easy as that.

enable_diver will enable the 1/64 cycle divider. Without this flag set you'll be measuring each cycle. With it enabled the counter gets increased for every 64 cycles. This is useful if you want to measure long times that would otherwise cause the counter to overflow.

How to use it:

  // init counters:
  init_perfcounters (1, 0); 

  // measure the counting overhead:
  unsigned int overhead = get_cyclecount();
  overhead = get_cyclecount() - overhead;    

  unsigned int t = get_cyclecount();

  // do some stuff here..
  call_my_function();

  t = get_cyclecount() - t;

  printf ("function took exactly %d cycles (including function call) ", t - overhead);

Should work on all Cortex-A8 CPUs..

Oh - and some notes:

Using these counters you'll measure the exact time between the two calls to get_cyclecount() including everything spent in other processes or in the kernel. There is no way to restrict the measurement to your process or a single thread.

Also calling get_cyclecount() isn't free. It will compile to a single asm-instruction, but moves from the co-processor will stall the entire ARM pipeline. The overhead is quite high and can skew your measurement. Fortunately the overhead is also fixed, so you can measure it and subtract it from your timings.

In my example I did that for every measurement. Don't do this in practice. An interrupt will sooner or later occur between the two calls and skew your measurements even further. I suggest that you measure the overhead a couple of times on an idle system, ignore all outsiders and use a fixed constant instead.

Nils Pipenbrinck 2010-07-14 21:50:29

Dear Nils,thank you again for such a quick and detailed reply.I want to go step by step in this approach because I wish to learn how all this works, so I started from very basic level. I have not programmed in assembly before and I don't know all the prerequisites, so, kindly bear my ignorance.I wrote a new main file and I included the first 2 lines in the main(){} and I compiled it using gcc. I had no compilation errors, a final executable file was generated and upon executing it I get "Illegal instruction". Have missed anything here?

vikramtheone 2010-07-15 07:59:47

Nils, I have CodeSorcery g++ IDE (30 day trial) installed on my Linux desktop system. I thought of building my project there (using cross compiler tools) and then use the executable on my i.MX515 board. I wrote the program as you have mentioned, the executable was generated. I tried debugging (on the emulator), but the Code sorcery threw up an error as Illegal Instruction and it stopped, well it was not so important for me there anyway. I copied the executable to my i.MX515 board and tried executing it, but once again, I got - Illegal instruction message :( (I have edited the question)

vikramtheone 2010-07-15 12:20:39

I was not knowing about the two states in which an OS functions: Kernel and User mode. I just found out about it from my colleague. May be I'm currently running in the user mode, thats the reason why, even though my program compiles without nay errors, I get Illegal instruction message.

vikramtheone 2010-07-15 14:20:04

@vikramtheone, The first two lines must be executed from kernel mode. They enable user-mode access to the CCNT (and related) registers. There is no way around this. The easiest way is in my opinion to write a super short kernel module that does this. Compiling these modules need the kernel-headers of the kernel you're running on your board but since you use ubuntu that shouldn't be a big problem. Here is a minimal kernel-module source: http://torus.untergrund.net/code/perfcnt_enable.c

Nils Pipenbrinck 2010-07-15 22:47:28

you compile it (on the target!) using make -C <path-to-kernel-source> SUBDIRS=$(PWD) modules . That should generate a file called perfcnt_enable.ko which you can load (on the target) using insmod ./perfcnt_enable.ko. dmesg will tell you if it worked or not.

Nils Pipenbrinck 2010-07-15 22:49:27

ansaurus

tags:

views:

answers:

How to measure program execution time in ARM Cortex-A8 processor?

Follow ups

related questions