views:

626

answers:

3

I was benchmarking a large scientific application, and found it would sometimes run 10% slower given the same inputs. After much searching, I found the the slowdown only occurred when it was running on core #2 of my quad core CPU (specifically, an Intel Q6600 running at 2.4 GHz). The application is a single-threaded and spends most of its time in CPU-intensive matrix math routines.

Now that I know one core is slower than the others, I can get accurate benchmark results by setting the processor affinity to the same core for all runs. However, I still want to know why one core is slower.

I tried several simple test cases to determine the slow part of the CPU, but the test cases ran with identical times, even on slow core #2. Only the complex application showed the slowdown. Here are the test cases that I tried:

  • Floating point multiplication and addition:

    accumulator = accumulator*1.000001 + 0.0001;
    
  • Trigonometric functions:

    accumulator = sin(accumulator);
    accumulator = cos(accumulator);
    
  • Integer addition:

    accumulator = accumulator + 1;
    
  • Memory copy while trying to make the L2 cache miss:

    int stride = 4*1024*1024 + 37;  // L2 cache size + small prime number
    for(long iter=0; iter<iterations; ++iter) {
        for(int offset=0; offset<stride; ++offset) {
            for(i=offset; i<array_size; i += stride) {
                array1[i] = array2[i];
            }
        }
    }
    

The Question: Why would one CPU core be slower than the others, and what part of the CPU is causing that slowdown?

EDIT: More testing showed some Heisenbug behavior. When I explicitly set the processor affinity, then my application does not slow down on core #2. However, if it chooses to run on core #2 without an explicitly set processor affinity, then the application runs about 10% slower. That explains why my simple test cases did not show the same slowdown, as they all explicitly set the processor affinity. So, it looks like there is some process that likes to live on core #2, but it gets out of the way if the processor affinity is set.

Bottom Line: If you need to have an accurate benchmark of a single-threaded program on a multicore machine, then make sure to set the processor affinity.

+2  A: 

Most modern cpu's have separate throttling of each cpu core due to overheating or power saving features. You may try to turn off power-saving or improve cooling. Or maybe your cpu is bad. On my i7 I get about 2-3 degrees different core temperatures of the 8 reported cores in "sensors". At full load there is still variation.

krosenvold
+4  A: 

You may have applications that have opted to be attached to the same processor(CPU Affinity).

Operating systems would often like to run on the same processor as they can have all their data cached on the same L1 cache. If you happen to run your process on the same core that your OS is doing a lot of its work on, you could experience the effect of a slowdown in your cpu performance.

It sounds like some process is wanting to stick to the same cpu. I doubt it's a hardware issue.

It doesn't necessarily have to be your OS doing the work, some other background daemon could be doing it.

Kekoa
A: 

Another possibility is that the process is being migrated from one core to another while running. I'd suggest setting the CPU affinity to the 'slow' core and see if it's just as fast that way.

Years ago, before the days of multicore, I bought myself a dual-socket Athlon MP for 'web development'. Suddenly my Plone/Zope/Python web servers slowed to a crawl. A google search turned up that the CPython interpreter has a global interpreter lock, but Python threads are backed by OS threads. OS Threads were evenly distributed among the CPUs, but only one CPU can acquire the lock at a time, thus all the other processes had to wait.

Setting Zope's CPU affinity to any CPU fixed the problem.

shapr