views:

485

answers:

4

I have a program with several threads, one thread will change a global when it exits itself and the other thread will repeatedly poll the global. No any protection on the globals. The program works fine on uni-processor. On dual core machine, it works for a while and then halt either on Sleep(0) or SuspendThread(). Would anyone be able to help me out on this?

The code would be like this:

Thread 1:

do something...
while(1)
{
.....
flag_thread1_running=false;
SuspendThread(GetCurrentThread());
continue;

}

Thread 2
flag_thread1_running=true;
ResumeThread(thread1);
.....do some other work here....
while(flag_thread1_running) Sleep(0);
....
A: 
D.Shawley
Thank you for the comments. I have defined the globals to be 'volatile' and it is no use
Shangping Guo
Actually I've seen different opinions on using volatile keywords. Someone says it is no use and will make the program much slower. So far from my practice I did not see any changes.
Shangping Guo
Why don't you just use an atomic variable?
WhirlWind
Hello WhirlWind, Thanks for the suggestion. I am not experienced in multithread programming yet, however, according to my opinion, there is no need to do the atomic operation on the global. There is no concurrent write to the global. Kindly correct me if I am wrong
Shangping Guo
Funny thing: at least on x86, atomic set and read operations compile to normal memory instructions. But the atomic types are coded so that the compiler won't optimize around their operations.
Karmastan
@karmastan: It could be that the memory writes are indeed already atomic for most types. Disabling the compiler optimizations is what I was after since the flag is most likely cached in a register somewhere and not being read from memory.
D.Shawley
anyone noticed the fact that the program runs good on uni-processor machine but deadlock on the dual-core machine? Any hints on this?
Shangping Guo
@ShangpingGuo: my guess is that it is a problem with coherency in the memory cache see http://en.wikipedia.org/wiki/Cache_coherence for some of the details. I would recommend using `CreateEvent()`, `SetEvent()`, and `WaitForSingleObject()` to implement proper communication between your threads instead of using `SuspendThread()`. The MSDN document on `SuspendThread` does mention that it is **not to be used for thread synchronization**.
D.Shawley
Thank you Shawley, this really works.
Shangping Guo
+2  A: 

Try using something more like WaitForSingleObjectEx instead of SuspendThread.

bta
Definitely. SuspendThread / ResumeThread used like this can lead to hard to debug race conditions, which hit you much more frequently on multi-CPU systems. Using proper synchronization primitives is a must.See also http://stackoverflow.com/questions/131818/is-putting-thread-on-hold-optimal
Suma
+2  A: 

You are hitting a race condition. Thread 2 may execute flag_thread1_running=true; before thread 1 executes flag_thread1_running=false.

This is not likely to happen on single CPU, because with usual the scheduling quantum 10-20 ms you are not likely to hit the problem. It will happen there as well, but very rarely.

Using proper synchronization primitives is a must here. Instead of bool, use event. Instead of checking the bool in a loop, use WaitForSingleObject (or WaitForMultipleObjects for more elaborate stuff later).

It is possible to perform synchronization between threads using plain variables, but it is rarely a good idea and it is quite hard to do it right - cf. How can I write a lock free structure?. It is definitely not a good idea to perform schedulling using Sleep, Suspend or Resume.

Suma
I just tried the WaitforsingleObject and create an event and abandoned the polling mechanism. It so far works fine on the dual core machine. Thank you for the useful hints
Shangping Guo
+11  A: 

The fact that you don't see any problem on a uniprocessor machine, but see problems on a multiproc machine is an artifact of the relatively large granularity of thread context switching on a uniprocessor machine. A thread will execute for N amount of time (milliseconds, nanoseconds, whatever) before the thread scheduler switches execution to a different thread. A lot of CPU instructions can execute in the typical thread timeslice. You can think of it as having a fairly large chunk of "free play" exclusive processor time during which you probably won't run into resource collisions because nothing else is executing on the processor.

When running on a multiproc machine, though, CPU instructions in two threads execute exactly at the same time. The size of the "free play" chunk of time is near zero.

To reproduce a resource contention issue between two threads, you need to get thread 1 to be accessing the resource and thread 2 to be accessing the resource at the same time, or very nearly the same time.

In the large-granularity thread switching that takes place on a uniprocessor machine, the chances that a thread switch will happen exactly in the right spot are slim, so the program may never exhibit a failure under normal use on a uniproc machine.

In a multiproc machine, the instructions are executing at the same time in the two threads, so the chances of thread 1 and thread 2 accessing the same resource at the same time are much, much greater - thousands of times more likely than the uniprocessor scenario.

I've seen it happen many times: an app that has been running fine for years on uniproc machines suddenly starts failing all over the place when executed on a new multiproc machine. The cause is a latent threading bug in the original code that simply never hit the right coincidence of timeslicing to repro on the uniproc machines.

When working with multithreaded code, it is absolutely imperitive to test the code on multiproc hardware. If you have thread collision issues in your code, they will quickly present themselves on a multiproc machine.

As others have noted, don't use SuspendThread() unless you are a debugger. Use mutexes or other synchronization objects to coordinate between threads.

dthorpe
+1 This sounds so much like the debugging talks you did on one of the BorCon (was it "Reading the Tea Leaves"?). Can I quote you on this one?
Jeroen Pluimers
Yes, I'm pretty sure I would have mentioned this in the "Reading Tea Leaves" talk. Quote away!
dthorpe