views:

130

answers:

4

Once I noticed that Windows doesn't keep computation-intensive threads on a specific core - it keeps switching cores instead. So I speculated that the job would be done faster, if the thread would keep access to the same data caches. And really, I was able to observe a stable ~1% speed improvement after setting the thread's affinity mask to a single core (in a ppmd (de)compression thread). But then I tried to build a simple demo for this effect, and more or less failed - that is, it works as expected on my system (Q9450):

buflog=21 bufsize=2097152
(cache flush) first run    = 6.938s
time with default affinity = 6.782s
time with first core only  = 6.578s
speed gain is 3.01%

but people I asked weren't exactly able to reproduce the effect. Any suggestions?

#include <stdio.h>
#include <windows.h>
int buflog=21, bufsize, bufmask;
char* a;
char* b;
volatile int r = 0;
__declspec(noinline)
int benchmark( char* a ) {
  int t0 = GetTickCount();
  int i,h=1,s=0;
  for( i=0; i<1000000000; i++ ) {
    h = h*200002979 + 1;
    s += ((int&)a[h&bufmask]) + ((int&)a[h&(bufmask>>2)]) + ((int&)a[h&(bufmask>>4)]);
  } r = s;
  t0 = GetTickCount() - t0;
  return t0;
}
DWORD WINAPI loadcore( LPVOID ) {
  SetThreadAffinityMask( GetCurrentThread(), 2 );
  while(1) benchmark(b);
}
int main( int argc, char** argv ) {
  if( (argc>1) && (atoi(argv[1])>16) ) buflog=atoi(argv[1]);
  bufsize=1<<buflog; bufmask=bufsize-1;
  a = new char[bufsize+4];
  b = new char[bufsize+4];
  printf( "buflog=%i bufsize=%i\n", buflog, bufsize );
  CreateThread( 0, 0, &loadcore, 0, 0, 0 );
  printf( "(cache flush) first run    = %.3fs\n", float(benchmark(a))/1000 );
  float t1 = benchmark(a); t1/=1000;
  printf( "time with default affinity = %.3fs\n", t1 );
  SetThreadAffinityMask( GetCurrentThread(), 1 );
  float t2 = benchmark(a); t2/=1000;
  printf( "time with first core only  = %.3fs\n", t2 );
  printf( "speed gain is %4.2f%%\n", (t1-t2)*100/t1 );
  return 0;
}

P.S. I can post a link to compiled version if anybody needs that.

A: 

Maybe you are just lucky, and on the other PCs where you tested the program, someone did exactly the same thing as you did, but his thread is sleeping a lot.

That would lead to your program being interrupted every now and then, when the other thread gets scheduled.

Christopher
Actually that was the thing I initially wanted to ask here -how to determine the best core to set affinity to.But then the demo I made to show that appeared questionable by itself.And no, I even checked some programs (like rar,7-zip) where I couldexpect to see it, and there's nothing.So the behaviour of this demo changes likely depending on cache sizes and such.
Shelwien
You cannot select a processor, which is the best for your thread. The best would be to let the OS decide. Maybe the thread-affinity helps on your system, but it reduces performance another system. There maybe a couple of things in windows, which have an (implicit) processor affinity. For one, DPCs seem to get queued on the same processor as from where they were queued. So drivers may always steal time from the same Processor and not be scheduled to another one.
Christopher
I was happily using the speed boost until getting the same idea(that my target core may be already used by something else).Then the obvious solution seemed to determine on which core the thread currently runs and stick to it. But GetCurrentProcessorNumber() is vista+ and I didn't find any alternative for older windows versions.
Shelwien
A: 
  1. Windows doesn't deliberately swap processes between CPUs. If it did it to you, you were just unlucky.
  2. You might get minor speed breaks if you are getting a lot of cache hits, it depends on your application. (Unless you have some big iron with funky NUMA memory architecture, that can cause all sorts of dependencies).
  3. In your case, why not just increase the process priority so that it never gets swapped off the CPU?
Michael J
damn, no image support in comments, so posting as "answer".And its not a priority problem, its scheduling problem. And I never said that there's a huge gain from setting affinity - but getting a program to run 1% faster with a single API call is not quite useless either.
Shelwien
+1  A: 

default affinity: default affinity

affinity set to core #4 affinity set to core #4

Now, this is an archiver. Do you really think that the worker thread going all around the cpu is ok?

Shelwien
A: 

How do you know the other 3 cores are being used by your thread and not some system threads? For example if you are paging or something. Set up some performance counters on your process in perfmon and verify this assumption.

tholomew
I know that simply because its idle. That's a dedicated testing machine, there're no other active processes. Also that test even runs on ramdrive. The test process has other threads though, which shows on snapshot with fixed affinity.
Shelwien