Reducing performance variations on Linux

views:

answers:

+1 Q:

Reducing performance variations on Linux

Hi all,

I am trying to benchmark a piece of software that runs on an Intel Pentium with Linux on top of it. The problem is, that I get considerable performance variations during consecutive test runs, when using the RDTSC instruction. Runtimes of exactly the same piece of software vary between 5 million and 10 million clock cycles, so in the worst case scenario I have an overhead of 100%. I am aware that there are performance variations caused by cache contention, however, is there maybe I way that I can eliminate other potential problems like interrupts, other processes etc.?

Would be thankful for any useful tips how to do this properly.

Many thanks, Kenny

Some general things: raise the test process priority (man 1 nice), stop as many other process as possible, unload unused kernel modules, flush disk caches (so that background kernel threads have less work), reboot in the single-user mode?

ygrek 2010-01-26 09:45:33

+2 A:

Common problems in this general area are:

process migration in multi-CPU/multi-core systems
RDTSC not consistent across cores in multi-CPU/multi-core systems
other processes taking CPU time (also interrupts, I/O, screen activity, etc)
automatic CPU clock frequency scaling
VM page faults etc

Solutions:

If you're running a single threaded process on a multi-CPU/multi-core systems then use CPU affinity to lock the process to a specific core. (Use taskset from the command line or call sched_setaffinity() from within you code.)
make sure you have no other processes taking CPU time, disable screen savers or other desktop animations and make sure there are no screen updates while your code is running. Also don't use e.g. printf to a GUI console window during your code timing - save any results output until after you've collected your last timestamp. (If possible you could even consider killing the GUI completely.)
Use a more reliable timing method than RDTSC (I typically use clock_gettime(CLOCK_PROCESS_CPUTIME_ID, ...) on Linux).
Disable automatic clock frequency scaling (e.g. Linux: cpufreq-set)
Run your code in a loop, for say N repeats, preferably re-using the same memory allocations for any large data structures (to get rid of the effects of VM page faults etc). Ignore the first measurement and average the remaining N - 1 measurements.

Paul R 2010-01-26 09:46:37

I am aware of the RDTSC problematic when having multiple cores! For that, I disabled one of the cores during boot to make sure that is not becoming an issue for my measurements. All the other things I have pretty much considered. Thanks for your help

Kenny 2010-01-26 09:56:10

The best way to reduce variations caused by the system environment would be running your benchmark in "single user" mode, also known as initlevel 1, or "recovery mode".

You can boot into this mode by passing "-s" as a boot time option to the kernel, or you can switch a running system to it with "init 1".

In this mode, all daemons are stopped, and you are logged in as root. Pretty much anything that runs on the system runs from your interactive terminal.

ddaa 2010-01-26 09:48:41

That sounds good, I will give it a go!

Kenny 2010-01-26 09:54:42

Tried it, unfortunately the variations still remain in place.

Kenny 2010-01-26 10:25:25

Please make sure you deactivate frequency scaling in the BIOS and the operating system. Also it sounds like you are using a P4, so make sure you turn off hyperthreading.

I have encountered performance variations like you describe in the past, due to such things.

This page describes how to turn it on, which which should give you what you need to turn it off.

You will also need to reboot your machine and look in the bios settings to determine if it is doing it automatically, without the operating system knowing.

Alex Brown 2010-01-26 09:56:56

Thanks for the clue. So are you saying that I should check in the BIOS first if I can disable frequency scaling before tackling this problem at the OS level? Or do I also need to make the changes in the OS? Cheers

Kenny 2010-01-26 10:22:29

fixing the bios is easier, and if you don't fix it you won't make any headway with the OS - so do it first.

Alex Brown 2010-01-26 11:49:18

Have you considered running the code inside valgrinds cachegrind or callgrind tools? These should be able to provide you with accurate instruction counts by running the code through valgrinds "VM".

Michael Anderson 2010-01-26 13:09:32

ansaurus

tags:

views:

answers:

Reducing performance variations on Linux

related questions