views:

1712

answers:

7

When writing multi-threaded applications, one of the most common problems experienced are race conditions.

My question to the community, is:

What is a race condition? How do you detect them? How do you handle them? And finally, how do you prevent them from occurring?

+1  A: 

There is a great chapter in the Secure Programming for Linux HOWTO that describes what they are, and how to avoid them.

Craig H
+5  A: 

A race condition is a kind of bug, that happen only with certain temporal conditions.

Example: Imagine you have two threads A and B.

In Thread A:

if( object.a != 0 )
    object.avg = total / object.a

In Thread B:

object.a = 0

If thread A is preempted just after having check that object.a is not null, B will do a = 0, and when thread A will gain the processor, it will do a "divide by zero". This bug only happen when thread A is preempted just after the if statement, it's very rare, but it can happen.

Steve Gury
+18  A: 

A race condition occurs when 2 or more threads are able to access shared data and they try to change it at the same time. Because the thread scheduling algorithm can swap between threads at any point, you don't know the order at which the threads will attempt to access the shared data. Therefore, the result of the change in data is dependent on the thread scheduling algorithm, i.e. both threads are 'racing' to access/change the data.

Often problems occur when one thread does a "check-then-act" (e.g. "check" if the value is X, and then "act" to do something that depends on the value being X) and another thread does something to the value in between the "check" and the "act".

if(x == 5) //The 'Check'
{
   y = x * 2; //The 'Act'
  //If x is changed by another thread in between the if(x==5) and the "y=x*5", y will not be equal to 10.
}

The point being, y could be 10, or it could be anything, depending on whether another thread changed x in between the check and act. You have no real way of knowing.

In order to prevent race conditions occuring, typically you would put a lock around the shared data to ensure that only one thread can access the data at a time. This would mean something like this:

//Obtain lock for x
if(x == 5)
{
   y = x * 2; //Now, nothing can change x until the lock is released. Therefore y = 10
}
//release lock for x
Lehane
What does the other thread do when it encounters the lock? Does it wait? Error?
Brian Ortiz
Yes, the other thread will have to wait until the lock is released before it can proceed. This makes it very important that the lock is released by the holding thread when it is finished with it. If it never releases it, then the other thread will wait indefinitely.
Lehane
A: 

A race condition is a situation on concurrent programming where two concurrent threads or processes and the resulting final state depends on who gets the resource first.

Jorge Córdoba
+2  A: 

Race conditions occur in multi-threaded applications or multi-process systems. A race condition, at its most basic, is anything that makes the assumption that two things not in the same thread or process will happen in a particular order, without taking steps to ensure that they do. This happens commonly when two threads are passing messages by setting and checking member variables of a class both can access. There's almost always a race condition when one thread calls sleep to give another thread time to finish a task (unless that sleep is in a loop, with some checking mechanism).

Tools for preventing race conditions are dependent on the language and OS, but some comon ones are mutexes, critical sections, and signals. Mutexes are good when you want to make sure you're the only one doing something. Signals are good when you want to make sure someone else has finished doing something. Minimizing shared resources can also help prevent unexpected behaviors

Detecting race conditions can be difficult, but there are a couple signs. Code which relies heavily on sleeps is prone to race conditions, so first check for calls to sleep in the affected code. Adding particularly long sleeps can also be used for debugging to try and force a particular order of events. This can be useful for reproducing the behavior, seeing if you can make it disappear by changing the timing of things, and for testing solutions put in place. The sleeps should be removed after debugging.

The signature sign that one has a race condition though, is if there's an issue that only occurs intermittently on some machines. Common bugs would be crashes and deadlocks. With logging, you should be able to find the affected area and work back from there.

tsellon
+3  A: 
Chris Conway
+6  A: 
privatehuff
See http://jakob.engbloms.se/archives/65 for an example of a program to test how oiften such things go bad... it really depends on the memory model of the machine you are running on.
jakobengblom2
Great explanation, thanks.
Philip Morton