views:

215

answers:

1

The Windows API provides critical sections in which a waiting thread will spin a limited amount of times before context switching, but only on a multiprocessor system. These are implemented using InitializeCriticalSectionAndSpinCount. (See http://msdn.microsoft.com/en-us/library/ms682530.aspx.) This is efficient when you have a critical section that will often only be locked for a short period of time and therefore contention should not immediately trigger a context switch. Two related questions:

  1. For a high-level, cross-platform threading library or an implementation of a synchronized block, is having a small amount of spinning before triggering a context switch a good default?
  2. What, if anything, is the equivalent to InitializeCriticalSectionAndSpinCount on other OS's, especially Posix?

Edit: Of course no spin count will be optimal for all cases. I'm only interested in whether using a nonzero spin count would be a better default than not using one.

+3  A: 

My opinion is that the optimal "spin-count" for best application performance is too hardware-dependent for it to be an important part of a cross-platform API, and you should probably just use mutexes (in posix, pthread_mutex_init / destroy / lock / trylock) or spin-locks (pthread_spin_init / destroy / lock / trylock). Rationale follows.

What's the point of the spin count? Basically, if the lock owner is running simultaneously with the thread attempting to acquire the lock, then the lock owner might release the lock quickly enough that the EnterCriticalSection caller could avoid giving up CPU control in acquiring the lock, improving that thread's performance, and avoiding context switch overhead. Two things:

1: obviously this relies on the lock owner running in parallel to the thread attempting to acquire the lock. This is impossible on a single execution core, which is almost certainly why Microsoft treats the count as 0 in such environments. Even with multiple cores, it's quite possible that the lock owner is not running when another thread attempts to acquire the lock, and in such cases the optimal spin count (for that attempt) is still 0.

2: with simultaneous execution, the optimal spin count is still hardware dependent. Different processors will take different amounts of time to perform similar operations. They have different instruction sets (the ARM I work with most doesn't have an integer divide instruction), different cache sizes, the OS will have different pages in memory... Decrementing the spin count may take a different amount of time on a load-store architecture than on an architecture in which arithmetic instructions can access memory directly. Even on the same processor, the same task will take different amounts of time, depending on (at least) the contents and organization of the memory cache.

If the optimal spin count with simultaneous execution is infinite, then the pthread_spin_* functions should do what you're after. If it is not, then use the pthread_mutex_* functions.

Aidan Cully
I agree with you spin counting was never a good idea and hopefully will be removed in new cpu generations with much more efficient hardware and scheduler supported blocking/unblocking. And by the way for the OP, the new slim read/writer implementation in Vista/Windows7 is faster then critical sections anyway so it is recommended to use this instead of spin count/critical sections.
Lothar