views:

1219

answers:

11

Many projects I work on have poor threading implementations and I am the sucker who has to track these down. Is there an accepted best way to handle threading. My code is always waiting for an event that never fires.

I'm kinda thinking like a design pattern or something.

+10  A: 

Learning to write multi-threaded programs correctly is extremely difficult and time consuming.

So the first step is: replace the implementation with one that doesn't use multiple threads at all.

Then carefully put threading back in if, and only if, you discover a genuine need for it, when you've figured out some very simple safe ways to do so. A non-threaded implementation that works reliably is far better than a broken threaded implementation.

When you're ready to start, favour designs that use thread-safe queues to transfer work items between threads and take care to ensure that those work items are accessed only by one thread at a time.

Try to avoid just spraying lock blocks around your code in the hope that it will become thread-safe. It doesn't work. Eventually, two code paths will acquire the same locks in a different order, and everything will grind to a halt (once every two weeks, on a customer's server). This is especially likely if you combine threads with firing events, and you hold the lock while you fire the event - the handler may take out another lock, and now you have a pair of locks held in a particular order. What if they're taken out in the opposite order in some other situation?

In short, this is such a big and difficult subject that I think it is potentially misleading to give a few pointers in a short answer and say "Off you go!" - I'm sure that's not the intention of the many learned people giving answers here, but that is the impression many get from summarised advice.

Instead, buy this book.

Here is a very nicely worded summary from this site:

Multithreading also comes with disadvantages. The biggest is that it can lead to vastly more complex programs. Having multiple threads does not in itself create complexity; it's the interaction between the threads that creates complexity. This applies whether or not the interaction is intentional, and can result long development cycles, as well as an ongoing susceptibility to intermittent and non-reproducable bugs. For this reason, it pays to keep such interaction in a multi-threaded design simple – or not use multithreading at all – unless you have a peculiar penchant for re-writing and debugging!

Daniel Earwicker
Would enjoy reading comments about downvotes!
Daniel Earwicker
Looks like a common problem with downvotes of good comments. I guess someone has anger issues.
Ruslan
I haven't noticed that generally, but definitely with threading. I think people don't like to hear that it's extremely difficult. But it just is, and most code examples on the web are misleading IMO.
Daniel Earwicker
+1. I voted up. It's a good advice on how to deal with unstable threaded code.
Wouter van Nifterick
I didn't downvote, but maybe some people felt that "replace the implementation with one that doesn't use multiple threads at all" was sort of dodging the question a bit?
Peter
My response to that would be "Not at all". If you have a bunch of broken thread safe code and you're just starting out in this domain, you have only one option to get the code into a known state: remove the threading.
Daniel Earwicker
Not downvoting, because your answer is partly true, but a multithreaded program should be buil as such from the beginning. The "we'll tack on threading later" is just as stupid as tacking on any core feature latter, dangerous and even more work than doing it right from the start
Robert Gould
@Robert Gould - I completely agree. The multi-threading design would be a comprehensive refactoring of the earlier design. The "tacked on" approach usually results in lock blocks being spread around.
Daniel Earwicker
@Earwicker: I am often mystified at downvotes I experience and see. It's just the medium, and some people downvoting something that strikes them the wrong way.
Eddie
+21  A: 

(Assuming .NET; similar things would apply for other platforms.)

Well, there are lots of things to consider. I'd advise:

  • Immutability is great for multi-threading. Functional programming works well concurrently partly due to the emphasis on immutability.
  • Use locks when you access mutable shared data, both for reads and writes.
  • Don't try to go lock-free unless you really have to. Locks are expensive, but rarely the bottleneck.
  • Monitor.Wait should almost always be part of a condition loop, waiting for a condition to become true and waiting again if it's not.
  • Try to avoid holding locks for longer than you need to.
  • If you ever need to acquire two locks at once, document the ordering thoroughly and make sure you always use the same order.
  • Document the thread-safety of your types. Most types don't need to be thread-safe, they just need to not be thread hostile (i.e. "you can use them from multiple threads, but it's your responsible for taking out locks if you want to share them)
  • Don't access the UI (except in documented thread-safe ways) from a non-UI thread. In Windows Forms, use Control.Invoke/BeginInvoke

That's off the top of my head - I probably think of more if this is useful to you, but I'll stop there in case it's not.

Jon Skeet
A very nice list. Maybe a point like "Try to avoid calling out to external code while holding a lock." should be added as well.
mghie
Top of list should be - DON'T make it multithreaded unless there is a real need to.Second should be - Keep the threading simple, like long running tasks run on a single background thread so the UI stays responsive.
pipTheGeek
@pipTheGeek - amen!
Daniel Earwicker
I think you forgot to add (I think you've said this elsewhere) -- If you're writing a web application (ASP.NET), don't use explicit threading unless you really need to. Let the server handle threading individual requests for you. Otherwise, it won't scale.
RussellH
+4  A: 

BIG emphasis on the first point that Jon posted. The more immutable state that you have (ie: globals that are const, etc...), the easier your life is going to be (ie: the fewer locks you'll have to deal with, the less reasoning you'll have to do about interleaving order, etc...)

Also, often times if you have small objects to which you need multiple threads to have access, you're sometimes better off copying it between threads rather than having a shared, mutable global that you have to hold a lock to read/mutate. It's a tradeoff between your sanity and memory efficiency.

Aaron
Not just your sanity, but often performance as well - shared objects need locking, which kills scalability. An alternative to copying or immutability is to strive to ensure the object is only visible to one thread at a time, which (if correct) allows safe, lock-free mutation of the work items.
Daniel Earwicker
+1  A: 

I'd like to follow up with Jon Skeet's advice with a couple more tips:

  • If you are writing a "server", and are likely to have a high amount of insert parallelism, don't use Microsoft's SQL Compact. It's lock manager is stupid. If you do use SQL Compact, DON'T use serializable transactions (which happens to be the default for the TransactionScope class). Things will fall apart on you rapidly. SQL Compact doesn't support temporary tables, and when you try to simulate them inside of serialized transactions it does rediculsouly stupid things like take x-locks on the index pages of the _sysobjects table. Also it get's really eager about lock promotion, even if you don't use temp tables. If you need serial access to multiple tables , your best bet is to use repeatable read transactions(to give atomicity and integrity) and then implement you own hierarchal lock manager based on domain-objects (accounts, customers, transactions, etc), rather than using the database's page-row-table based scheme.

    When you do this, however, you need to be careful (like John Skeet said) to create a well defined lock hierarchy.

  • If you do create your own lock manager, use <ThreadStatic> fields to store information about the locks you take, and then add asserts every where inside the lock manager that enforce your lock hierarchy rules. This will help to root out potential issues up front.

  • In any code that runs in a UI thread, add asserts on !InvokeRequired (for winforms), or Dispatcher.CheckAccess() (for WPF). You should similarly add the inverse assert to code that runs in background threads. That way, people looking at a method will know, just by looking at it, what it's threading requirements are. The asserts will also help to catch bugs.

  • Assert like crazy, even in retail builds. (that means throwing, but you can make your throws look like asserts). A crash dump with an exception that says "you violated threading rules by doing this", along with stack traces, is much easier to debug then a report from a customer on the other side of the world that says "every now and then the app just freezes on me, or it spits out gobbly gook".

Scott Wisniewski
+1  A: 

Adding to the points that other folks have already made here:


Some developers seem to think that "almost enough" locking is good enough. It's been my experience that the opposite can be true -- "almost enough" locking can be worse than enough locking.

Imagine thread A locking resource R, using it, and then unlocking it. A then uses resource R' without a lock.

Meanwhile, thread B tries to access R while A has it locked. Thread B is blocked until thread A unlocks R. Then the CPU context switches to thread B, which accesses R, and then updates R' during its time slice. That update renders R' inconsistent with R, causing a failure when A tries to access it.


Test on as many different hardware and OS architectures as possible. Different CPU types, different numbers of cores and chips, Windows/Linux/Unix, etc.


The first developer who worked with multi-threaded programs was a guy named Murphy.

Dan Breslau
You also need to be careful about too much locking.Locking too much, or "at the wrong level" can devastate performance.
Scott Wisniewski
@Scott: Or lead to deadlocks
jalf
Jon Skeet (who?) addressed deadlocks: Always lock in the same order. (I've found it helpful to consider the granularity of a lock, always going from larger-grained to smaller-grained when more than 1 lock is needed.)But avoid prematurely optimizing. Lock first, ask questions later.
Dan Breslau
+4  A: 
jalf
Yes, the CSP book should be required reading!
Daniel Earwicker
+4  A: 

(Like Jon Skeet, much of this assumes .NET)

At the risk of seeming argumentative, comments like these just bother me:

Learning to write multi-threaded programs correctly is extremely difficult and time consuming.

Threads should be avoided when possible...

It is practically impossible to write software that does anything significant without leveraging threads in some capacity. If you are on Windows, open your Task Manager, enable the Thread Count column, and you can probably count on one hand the number of processes that are using a single thread. Yes, one should not simply use threads for the sake of using threads nor should it be done cavalierly, but frankly, I believe these cliches are used too often.

If I had to boil multithreaded programming down for the true novice, I would say this:

  • Before jumping into it, first understand that the the class boundary is not the same as a thread boundary. For example, if a callback method on your class is called by another thread (e.g., the AsyncCallback delegate to the TcpListener.BeginAcceptTcpClient() method), understand that the callback executes on that other thread. So even though the callback occurs on the same object, you still have to synchronize access to the members of the object within the callback method. Threads and classes are orthogonal; it is important to understand this point.
  • Identify what data needs to be shared between threads. Once you have defined the shared data, try to consolidate it into a single class if possible.
  • Limit the places where the shared data can be written and read. If you can get this down to one place for writing and one place for reading, you will be doing yourself a tremendous favor. This is not always possible, but it is a nice goal to shoot for.
  • Obviously make sure you synchronize access to the shared data using the Monitor class or the lock keyword.
  • If possible, use a single object to synchronize your shared data regardless of how many different shared fields there are. This will simplify things. However, it may also overly constrain things too, in which case, you may need a synchronization object for each shared field. And at this point, using immutable classes becomes very handy.
  • If you have one thread that needs to signal another thread(s), I would strongly recommend using the ManualResetEvent class to do this instead of using events/delegates.

To sum up, I would say that threading is not difficult, but it can be tedious. Still, a properly threaded application will be more responsive, and your users will be most appreciative.

EDIT: There is nothing "extremely difficult" about ThreadPool.QueueUserWorkItem(), asynchronous delegates, the various BeginXXX/EndXXX method pairs, etc. in C#. If anything, these techniques make it much easier to accomplish various tasks in a threaded fashion. If you have a GUI application that does any heavy database, socket, or I/O interaction, it is practically impossible to make the front-end responsive to the user without leveraging threads behind the scenes. The techniques I mentioned above make this possible and are a breeze to use. It is important to understand the pitfalls, to be sure. I simply believe we do programmers, especially younger ones, a disservice when we talk about how "extremely difficult" multithreaded programming is or how threads "should be avoided." Comments like these oversimplify the problem and exaggerate the myth when the truth is that threading has never been easier. There are legitimate reasons to use threads, and cliches like this just seem counterproductive to me.

Matt Davis
I love it - something is said so often (because it's true), which means we can call it a "cliche"!
Daniel Earwicker
These things would be even better with memory isolation (ie, not threads) to prevent programmers from accidentally using data in the wrong thread. A VM should actually be able to mark data state as no-access, read-only, thread-locked and such, at least in debug mode. That would be helpful.
Zan Lynx
A: 

Well, everyone thus far has been Windows / .NET centric, so I'll chime in with some Linux / C.

Avoid futexes at all costs(PDF), unless you really, really need to recover some of the time spent with mutex locks. I am currently pulling my hair out with Linux futexes.

I don't yet have the nerve to go with practical lock free solutions, but I'm rapidly approaching that point out of pure frustration. If I could find a good, well documented and portable implementation of the above that I could really study and grasp, I'd probably ditch threads completely.

I have come across so much code lately that uses threads which really should not, its obvious that someone just wanted to profess their undying love of POSIX threads when a single (yes, just one) fork would have done the job.

I wish that I could give you some code that 'just works', 'all the time'. I could, but it would be so silly to serve as a demonstration (servers and such that start threads for each connection). In more complex event driven applications, I have yet (after some years) to write anything that doesn't suffer from mysterious concurrency issues that are nearly impossible to reproduce. So I'm the first to admit, in that kind of application, threads are just a little too much rope for me. They are so tempting and I always end up hanging myself.

Tim Post
Forex: if you have threads pulling multiple work items off a queue, pull different numbers per thread so they do not all come back for more work at the same time.
Zan Lynx
Weird, the comment box erased the first line of my comment. I said, futexes only go slow when contented. Good threaded design avoids contention.
Zan Lynx
+2  A: 

Looking for a design pattern when dealing with threads is the really best approach to start with. It's too bad that many people don't try it, instead attempting to implement less or more complex multithreaded constructs on their own.

I would probably agree with all opinions posted so far. In addition, I'd recommend to use some existing more coarse-grained frameworks, providing building blocks rather than simple facilities like locks, or wait/notify operations. For Java, it would be simply the built-in java.util.concurrent package, which gives you ready-to-use classes you can easily combine to achieve a multithreaded app. The big advantage of this is that you avoid writing low-level operations, which results in hard-to-read and error-prone code, in favor of a much clearer solution.

From my experience, it seems that most concurrency problems can be solved in Java by using this package. But, of course, you always should be careful with multithreading, it's challenging anyway.

Bartosz Klimek
A: 

Instead of locking on containers, you should use ReaderWriterLockSlim. This gives you database like locking - an infinite number of readers, one writer, and the possibility of upgrading.

As for design patterns, pub/sub is pretty well established, and very easy to write in .NET (using the readerwriterlockslim). In our code, we have a MessageDispatcher object that everyone gets. You subscribe to it, or you send a message out in a completely asynchronous manner. All you have to lock on is the registered functions and any resources that they work on. It makes multithreading much easier.

Steve
+1  A: 
Julien Chastang