views:

1348

answers:

22

I'm looking into educating our team on concurrency. What are the most common pitfalls developers fall into surrounding concurrency. For instance, in .Net the keyword static opens the door to a lot of concurrency issues.

Are there other design patterns that are not thread safe?

Update

There are so many great answers here it is difficult to select just one as the accepted answer. Be sure to scroll through them all for great tips.

+11  A: 

One is race condition, which basically is assuming that one piece of code will run before / after another concurrent piece of code.

There are also deadlocks, that is code A waits for code B to release resource Y, while code B is waiting for A to release resource X.

Serhat Özgel
A: 

Some canonical pitfalls are deadlocks (two competing processes are stuck waiting for each other to release some resource) and race conditions (when the timing and/or dependence of events can lead to unexpected behavior). Here is a worthwhile video about "Multithreading Gotchas" as well.

Ben Hoffstein
crap, you beat me by 12 seconds. :)
siz
i won't make the obvious joke about a "race condition".
Ben Hoffstein
A: 

Deadlocks.

siz
+1  A: 

Here is a great resource about concurrency, specifically in Java: http://tech.puredanger.com/ Alex Miller lists many different issues one can encounter while dealing with concurrency. Highly recommended :)

xelurg
+7  A: 

One of the biggest pitfalls is the use of concurrency in the first place. Concurrency adds a substantial design/debugging overhead, so you really have to examine the problem and see if it really calls for concurrency.

Concurrency is certainly unavoidable in some domains, but when it is avoidable, avoid it.

James McMahon
+1  A: 

Calling public classes from within a lock causing a DeadLock

public class ThreadedClass
{
    private object syncHandle = new object();

    public event EventHandler Updated = delegate { };
    public int state = 0;

    public void DoSmething()
    {
        lock(syncHandle)
        {
            // some locked code
            state = 1;

            Updated(this, EventArgs.Empty);
        }
    }

    public int State { 
        get
        {
            int returnVal;
            lock(syncHandle)
                returnVal = state;
            return returnVal;            
        }
    }
}

You can't be certain what your client is going to call, Most likely they'll try to read the State property. Do this instead

public void DoSmething()
{
    lock(syncHandle)
    {
        // some locked code
        state = 1;
    }
    // this should be outside the lock
    Updated(this, EventArgs.Empty);
}
bendewey
+9  A: 

Concurrency doesn't have many pitfalls.

Synchronizing access to shared data, however, is tricky.

Here are some questions anyone writing shared-data synchronization code should be able to answer:

  1. What is InterlockedIncrement?
  2. Why does InterlockedIncrement need to exist at an assembly language level?
  3. What is read write reordering?
  4. What is the volatile keyword (in c++) and when do you need to use it?
  5. What is a synchronization hierarchy?
  6. What is the ABA problem?
  7. What is cache coherency?
  8. What is a memory barrier?

"Shared everything" concurrency is an extremely leaky abstraction. Adopt shared nothing message passing instead.

Mo Flanagan
-1 This laundry list is only relevant for programming in C++ (or C?) on MS-Windows.
starblue
This is relevant to many platforms and many languages. #2 has nothing to do with assembly language. #7 is probably more relevant on big iron SPARC and POWER architectures than on most MS-Windows systems. In fact, I don't see anything other than #4 that's only specific to C++.
vy32
A: 

Also you could look at out-of-process type concurrency problems
For instance:
A writer process making a file and a consumer process claiming the file.

Kb
A: 

Concurrency problems are incredibly difficult to debug. As a preventative measure, one can completely disallow access to shared objects without using mutexes in such a way that programmers can easily follow the rules. I have seen this done by making wrappers around the OS-provided mutexes and semaphores, etc.

Here are some confusing examples from my past:

I used to develop printer drivers for Windows. In order to prevent multiple threads from writing to the printer at the same time, our port monitor used a construction like this: // pseudo code because I can't remember the API BOOL OpenPort() { GrabCriticalSection(); } BOOL ClosePort() { ReleaseCriticalSection(); } BOOL WritePort() { writestuff(); }

Unfortunately, each call to WritePort was from a different thread in the spooler's thread pool. We eventually got into a situation where OpenPort and ClosePort were called by different threads, causing a deadlock. The solution for this is left as an exercise, because I can't remember what I did.

I also used to work on printer firmware. The printer in this case used an RTOS called uCOS (pronounced 'mucus') so each function had its own task (print head motor, serial port, parallel port, network stack, etc.). One version of this printer had an internal option that plugged into a serial port on the printer motherboard. At some point, it was discovered that the printer would read the same result twice from this peripheral, and every value there after would be out of sequence. (e.g, the peripheral read a sequence 1,7,3,56,9,230 but we would see 1,7,3,3,56,9,230. This value was getting reported to the computer and put into a database so having a pile of documents with wrong ID numbers was Very Bad) The root cause of this was a failure to respect the mutex that was protecting the read buffer for the device. (Hence my advice at the beginning of this response)

Matt Kane
+1  A: 

Double-checked locking is broken, at least in Java. Understanding why this is true, and how you can fix it, leads you deep into understanding concurrency issues and Java's memory model.

Bill Michell
IIRC the Java memory model has changed and it's not "broken" anymore - just an utterly unnecessary premature optimization.
Michael Borgwardt
You can un-break it by declaring the variable as volatile or using an AtomicReference - but this forces a memory barrier every time it is accessed. Bare double-checked locking is definitely still broken.
Bill Michell
+3  A: 

As stated in other answers, the two most likely problems are deadlocks and race conditions. However my main advice would be that if you are looking to train a team on the subject of concurrency I would strongly recommend getting some training yourself. Get a good book on the subject, don't rely on a few paragraphs from a website. A good book would depend on the language you are using: "Java Concurrency in Practice" by Brian Goetz is good for that language, but there are plenty of others.

DJClayworth
+7  A: 

I teach concurrency a lot to my friends and co-workers. Here are some of the big pitfalls:

  • Assuming that a variable that is mostly read in many threads and only written in one thread doesn't need to be locked. (In Java, this situation may result in the reading threads never seeing the new value.)
  • Assuming that the threads will run in a particular order.
  • Assuming that the threads will run simultaneously.
  • Assuming that the threads will NOT run simultaneously.
  • Assuming that all of the threads will make forward progress before any one of the threads ends.

I also see:

  • Big confusions between thread_fork() and fork().
  • Confusions when memory is allocated in one thread and free()'ed in another thread.
  • Confusions resulting from the fact that some libraries are threadsafe and some are not.
  • People using spin-locks when they should use sleep & awake, or select, or whatever blocking mechanism your language supports.
vy32
So basically, assuming anything about the behavior of threads is bad. :)
Herms
You can assume that each thread will make forward progress at some point.
vy32
+1 I remember reading this answer months ago, and just now I found a test that assumed something about the operation of threads and I had to come find this post just to up-vote it.
Aren
Thanks! I'm glad I could be of service.
vy32
A: 

Some rules of thumb:

(1) Keep an eye on the context when your declare a variable

  • Write to class attributes (static) has to be synchronized
  • Write to instance attributes has to be synchronized
  • Keep all variables as local as possible (do not put them in a member context unless it makes sense)
  • Mark variables that are read only immutable

(2) Lock the access to mutable class or instance attributes: Variables that are part of the same invariant should be protected by the same lock.

(3) Avoid Double Checked Locking

(4) Keep locks when you run a distributed operation (call subroutines).

(5) Avoid busy waiting

(6) Keep the workload low in synchronized sections

(7) Do not allow to take a client control while you are in a synchronized block.

(8) Comments! This really helps to understand what the other guy had in mind with declaring this section synchronized or that variable immutable.

olli
+2  A: 

In my experience, many (skilled) developers lack foundational knowledge about concurrency theory. The classic textbooks on Operating Systems by Tanenbaum or Stallings do a good job in explaining the theory and implications of concurrency: Mutual exclusion, synchronization, deadlocks and starvation. A good theoretical background is mandatory to successfully work with concurrency.

That being said, concurrency support varies greatly between programming languages and different libraries. Furthermore, test-driven development doesn't get you very far in detecting and solving concurrency problems (although transient test failures indicate concurrency issues).

Pankrat
+2  A: 

The #1 pitfall I have seen is too much data sharing.

I believe that one of the better ways to handle concurrency is multiple processes instead of threads. In this way, communication between threads/processes is strictly limited to the chosen pipe, message queue or other communication method.

Zan Lynx
+3  A: 

One truth to keep in mind is that even if the initial developers get their tasking model working properly (which is a big if), then the maintenance team that follows will surely screw things up in unimaginable ways. The take-away from that is to limit the traces of concurrency throughout your system. Do your best to make sure that most of your system is blissfully unaware that concurrency is occurring. That gives fewer opportunities for people unfamiliar with the tasking model to inadvertently screw things up.

Too often people go thread/task crazy. Everything works on its own thread. The end result is that nearly every piece of code has to be intimately aware of threading issues. It forces otherwise simple code to be littered with locking and synchronization confuscations. Every time I’ve seen this, the system eventually becomes an unmaintainable mess. Yet, every time I’ve seen this, the original developers still insist it was a good design:(

Like multiple inheritance, if you want to create a new thread/task then assume you are wrong until proven otherwise. I can’t even count the number of times that I’ve seen the pattern Thread A calls Thread B thenThread B calls Thread C then Thread C calls D all waiting for a response from the previous thread. All the code is doing is making long-winded function calls through different threads. Don’t use threads when function calls work just fine.

Always remember, Message Queues are your best friend when you want to work concurrently.

I have found that creating a core infrastructure that handles nearly all the concurrency issues works best. If there are any threads outside of the core infrastructure that must talk to another piece of software then they must go through the core infrastructure. That way the rest of the system can remain concurrency unaware and the concurrency issues can be handled by the person(s) who hopefully understand concurrency.

Dunk
+10  A: 

There are lots of good answers and pointers on this thread already, but let me add one thing.

DO NOT RELY ON TESTING TO FIND RACE CONDITIONS AND DEADLOCKS

Assuming you have all the good development processes in place: unit tests for each component, smoke tests for each nightly build, requiring each developer's changes to pass tests before checkin , etc.

All that is well and good, but it leads to an attitude of "well, it passed the test suite, so it can't be a bug in my code." which will not serve you well in concurrent programming. Real-time concurrency bugs are fiendishly hard to reproduce. You can run a piece of code with a race condition a billion times before it fails.

You will have to adjust your process to put greater emphasis on code reviews, conducted by your best minds. Having a seperate code review just for concurrency issues only is not a bad idea.

You will have to put more emphasis on making your application self-debugging. That is, when you get a failure in the test lab or at your customer's site, you need to make sure enough information is captured and logged to let you do a definitive postmortem, since the odds of your being able to reproduce the bug report at your convenience are negligible.

You will have to put more emphasis on paranoid sanity checks in your code so that a fault is detected as close to the problem as possible, and not 50,000 lines of code away.

Be paranoid. very paranoid.

Die in Sente
So true! It's weird that people are the best debuggers when it comes to concurrency :)
Alex
+1 Race conditions and deadlocks are a pain to troubleshoot, and can sometimes only ever show up in "release" binaries. Having a good code review process in place is essential.
Jon Tackabury
+1  A: 

One concurrent programming pitfall is improper encapsulation leading to races and deadlocks. This can probably happen in lots of different ways, though there are two in particular that I've seen:

  1. Giving variables unnecessarily wide scope. For example, sometimes people declare a variable at instance scope when local scope would do. This can create the potential for races where none need exist.

  2. Exposing locks unnecessarily. If there's no need to expose a lock, then it's consider keeping it hidden away. Otherwise clients may use it and create deadlocks that you could have prevented.

Here's a simple example of #1 above, one that's pretty close to something I saw in production code:

public class CourseService {
    private CourseDao courseDao;
    private List courses;

    public List getCourses() {
        this.courses = courseDao.getCourses();
        return this.courses;
    }
}

In this example there's no need for the courses variable to have instance scope, and now concurrent calls to getCourses() can have races.

Willie Wheeler
Would private List courses have to be static for this to be a problem?
Nope. If a single CourseService instance is shared by multiple threads, then it's possible for races to occur.
Willie Wheeler
+2  A: 

It all comes down to shared data/shared state. If you share no data or state then you have no concurrency problems.

Most people, when they think of concurrency, think of multi-threading in a single process.

One way to think about this is, what happens if you split your process into multiple processes. Where do they have to communicate with each other? If you can be clear on where the processes have to communicate with each other then you have a good idea about the data they share.

Now, as mental test, move those multiple processes onto individual machines. Are your communication patterns still correct? Can you still see how to make it work? If not, one might want to reconsider multiple threads.

(The rest of this doesn't apply to Java threading, which I don't use and therefore know little about).

The other place where one can get caught, is, if you use locks to protect shared data, you should write a lock monitor that can find deadlocks for you. Then you need to have your program(s) deal with the possibility of deadlocks. When you get a deadlock error your have to release all of your locks, backup, and try again.

You are unlikely to make multiple locks work well otherwise without a level of care which is quite rare in real systems.

Good luck!

Bruce ONeel
A: 

This isn't a pitfall, but more of a tip based on other's responses. The .NET framework's readerwriterlockslim will dramatically improve your performance in many cases over the "lock" statement, while being reentrant.

Steve
A: 

Composability. In any non-trivial system, ad-hoc approaches to synchronisation within different sub-systems make interaction between them usually error-prone and occasionally impossible. See this video for an example of how even the most trivial code is prone to these problems.

Personally, I am a convert to the Actor model of concurrent computation (the asynchronous variety).

Nick Gunn
+1  A: 

Just found this paper, sounds interesting: A Study of Common Pitfalls in Simple Multi-Threaded Programs

Rubens Farias