tags:

views:

190

answers:

7

Hi,

I'm attempting to time code using RDTSC (no other profiling software I've tried is able to time to the resolution I need) on Ubuntu 8.10. However, I keep getting outliers from task switches and interrupts firing, which are causing my statistics to be invalid.

Considering my program runs in a matter of milliseconds, is it possible to disable all interrupts (which would inherently switch off task switches) in my environment? Or do I need to go to an OS which allows me more power? Would I be better off using my own OS kernel to perform this timing code? I am attempting to prove an algorithm's best/worst case performance, so it must be totally solid with timing.

The relevant code I'm using currently is:

inline uint64_t rdtsc()
{
    uint64_t ret;
    asm volatile("rdtsc" : "=A" (ret));
    return ret;
}

void test(int readable_out, uint32_t start, uint32_t end, uint32_t (*fn)(uint32_t, uint32_t))
{
    int i;
    for(i = 0; i <= 100; i++)
    {
        uint64_t clock1 = rdtsc();
        uint32_t ans = fn(start, end);
        uint64_t clock2 = rdtsc();

        uint64_t diff = clock2 - clock1;

        if(readable_out)
            printf("[%3d]\t\t%u [%llu]\n", i, ans, diff);
        else
            printf("%llu\n", diff);
    }
}

Extra points to those who notice I'm not properly handling overflow conditions in this code. At this stage I'm just trying to get a consistent output without sudden jumps due to my program losing the timeslice.

The nice value for my program is -20.

So to recap, is it possible for me to run this code without interruption from the OS? Or am I going to need to run it on bare hardware in ring0, so I can disable IRQs and scheduling? Thanks in advance!

+2  A: 

Tricky. I don't think you can turn the operating system 'off' and guarantee strict scheduling.

I would turn this upside down: given that it runs so fast, run it many times to collect a distribution of outcomes. Given that standard Ubuntu Linux is not a real-time OS in the narrow sense, all alternative algorithms would run in the same setup --- and you can then compare your distributions (using anything from summary statistics to quantiles to qqplots). You can do that comparison with Python, or R, or Octave, ... whichever suits you best.

Dirk Eddelbuettel
I have been working on that (hence the for loop). The problem is that the outliers are significantly larger than the rest of the data which throws off my average and standard deviation.
Matthew Iselin
But now we're having a statistical discussion :-) Mean and standard deviation fully describe a normal distribution, yet your data may not be normally distributed. Try to describe the tail, especially the 'bad' one with quantiles and plots if you can. It is this worst case you are comparing against alternatives ... which may well be worse. [ Of course, the rest of the distribution needs descriptives too... ]
Dirk Eddelbuettel
A valid point. It is frustrating though to have over 90 values around 11,000 - 12,000 and then have some value hit 50,000 and throw off the entire thing. It's just those extra values which are *so wrong* that they interfere significantly with the statistics I'm trying to generate.
Matthew Iselin
A: 

If you run as root, you can call sched_setscheduler() and give yourself a real-time priority. Check the documentation.

William Pursell
+2  A: 

You might be able to get away with running FreeDOS, since it's a single process OS.

Here's the relevant text from the second link:

Microsoft's DOS implementation, which is the de facto standard for DOS systems in the x86 world, is a single-user, single-tasking operating system. It provides raw access to hardware, and only a minimal layer for OS APIs for things like the file I/O. This is a good thing when it comes to embedded systems, because you often just need to get something done without an operating system in your way.

DOS has (natively) no concept of threads and no concept of multiple, on-going processes. Application software makes system calls via the use of an interrupt interface, calling various hardware interrupts to handle things like video and audio, and calling software interrupts to handle various things like reading a directory, executing a file, and so forth.

Of course, you'll probably get the best performance actually booting FreeDOS onto actual hardware, not in an emulator.

I haven't actually used FreeDOS, but I assume that since your program seems to be standard C, you'll be able to use whatever the standard compiler is for FreeDOS.

Mark Rushakoff
A good link - good ol' DOS gives up the computer to running programs, which would work really well for this kind of thing.
Matthew Iselin
+1  A: 

You can use chrt -f 99 ./test to run ./test with the maximum realtime priority. Then at least it won't be interrupted by other user-space processes.

Also, installing the linux-rt package will install a real-time kernel, which will give you more control over interrupt handler priority via threaded interrupts.

Karl Voigtland
Additionally, by using the linux RT patches (http://rt.wiki.kernel.org/index.php/Main_Page) latency from interrupts will be minimized
bdonlan
+2  A: 

If your program runs in milliseconds, and if your are running on Linux, Make sure that your timer frequency (on linux) is set to 100Hz (not 1000Hz). (cd /usr/src/linux; make menuconfig, and look at "Processor type and features" -> "Timer frequency") This way your CPU will get interrupted every 10ms.

Furthermore, consider that the default CPU time slice on Linux is 100ms, so with a nice level of -20, you will not get descheduled if your are running for a few milliseconds.

Also, you are looping 101 times on fn(). Please consider giving fn() to be a no-op to calibrate your system properly.

Make statistics (average + stddev) instead of printing too many times (that would consume your scheduled timeslice, and the terminal will eventually get schedule etc... avoid that).

RDTSC benchmark sample code

Nicolas Viennot
Thanks for the tips.
Matthew Iselin
+2  A: 

If you call nanosleep() to sleep for a second or so immediately before each iteration of the test, you should get a "fresh" timeslice for each test. If you compile your kernel with 100HZ timer interrupts, and your timed function completes in under 10ms, then you should be able to avoid timer interrupts hitting you that way.

To minimise other interrupts, deconfigure all network devices, configure your system without swap and make sure it's otherwise quiescent.

caf
I have tried similar with sched_yield, which has improved the set of results a fair bit.
Matthew Iselin
A: 

Maybe there is some way to disable preemptive scheduling on linux, but it might not be needed. You could potentially use information from /proc/<pid>/schedstat or some other object in /proc to sense when you have been preempted, and disregard those timing samples.

TokenMacGuy