views:

116

answers:

2

To our great surprise we found recently this. With SP1 for Windows 2003 Microsoft changed a way critical sections behave. Earlier threads wanting to access them were served in FIFO manner. Right now they are served in pure "random" way.

In our case we had something like this:

// I now it's kind of ugly design but works
void Class:RunInThread()
{
   while(m_Running)
   {
       EnterCriticalSection(&m_CS);
       DoSomeStuffWithList();
       LeaveCriticalSection(&m_CS);
   }
} 
void Class::AddToList()
{
       EnterCriticalSection(&m_CS);
       AddSomeStuffToList();
       LeaveCriticalSection(&m_CS);
}

So with new implementation of critical section in 2003 SP2 AddToList might die in starvation since there is no guarantie that it will be awaken.

This example is a little bit extreme but on the other hand I have millions lines of code that were written with assumption that access to critical sections is serialized.

Is there a way to turn off this new critical section?

EDIT: Since getting back old version is not possible I am thinking of just doing global Search&Replace to change {Enter,Leaver}CriticalSection into something like My{Enter,Leave}CriticalSection. Have you ideas how this should be implemented so it behaves exactly like pre-SP2 version ?

+2  A: 

Unfortunately, you have a problem. What you have done is write your code depending on an implementation detail, and not to the spec.

EnterCriticalSection has always been documented to not guarantee any particular order that threads will acquire the section, but the fact that they did do so in a FIFO-manner, in older versions of the operating system, is what you have based your code around.

The way to turn off this new way of behavior is not to install SP1.

Now, having said that, I don't believe that there will be adverse problems with your code, unless you have prioritized your threads wildly different. Sure, one of the two methods might acquire the section more than once in a row, even though the other method is also waiting, but that shouldn't be a problem.

Lasse V. Karlsen
There can be actual problems. In particular I've encountered them on some legacy project where all non-threadsafe code has been locked in one giant Critical Section.You can say it should not have been a critical section in the first place, but still, it worked on XP and broke on 2003. And it was real code.
EFraim
But again, it relied on implementation details and not on the spec. As you posted in your answer, locks are best held for a very short amount of time, if possible. Of course badly written code will have problems with such changes, no doubt about that. As judging by the article in the link you posted, thread handling, locks, and scheduling, is not for the layman, you need to be VERY certain that you're doing it correct. "Happens to work" is perhaps good enough now, but isn't future-proof.
Lasse V. Karlsen
A: 

That's a known problem: http://www.bluebytesoftware.com/blog/PermaLink,guid,e40c2675-43a3-410f-8f85-616ef7b031aa.aspx Unfortunately the only way appears to be is to structure code such that it spends less time in critical section.

EFraim

related questions