Best practices to turn a .NET class library into a multithreaded .NET class library

If you don't have any previous experience in multithreading then I would recommend that you get the basics first by looking at the various resources: http://stackoverflow.com/questions/540242/book-or-resource-on-c-concurrency

Making your entire library multithreaded requires a brand new architectural approach. If you simply go around and start putting locks everywhere in your code you'll end up making your code very cumbersome and you might not even achieve any performance increases.

The best concurrent software is lock-free and wait-free... this is difficult to achieve in C# (.NET) since most of your Collections are not lock-free, wait-free or even thread-safe. There are various discussions on lock-free data structures. A lot of people have referenced Boyet's articles (which are REALLY good) and some people have been throwing around The Task Parallel Library as the next thing in .NET concurrency, but TPL really doesn't give you much in terms of thread-safe collections.

.NET 4.0 is coming out with Collections.Concurrent which should help a lot.

Making your entire library concurrent would not be recommended since it wasn't designed with concurrency in mind from the start. Your next option is to go through your library and identify which portions of it are actually good candidates for multithreading, then you can pick the best concurrency solution for them and implement it. The main thing to remember is that when you write multithreaded code, the concurrency should result in increased throughput of your program. If increased throughput is not achieved (i.e. you either match or the throughput is less than in the sequential version), then you should simply not use concurrency in that code.

Thank you Reed.

ileon 2010-03-10 02:27:53

@Reed Copsey: "data parallelism becomes dramatically simpler." Can you be more specific? I feel like only the execution of parallel tasks has become simpler... do you say that data parallelism becomes simpler because you can execute parallel tasks for each item in your collection, because of PLINQ or because there is something that makes the reduction of contention simpler?

Lirik 2010-03-10 02:54:39

+1. I haven't tried the 4 task/threading stuff yet, but I listened to a podcast and a lot of effort has been put into making it easier, both from a tools/IDE point of view as well as the API abstraction. I'm looking forward to digging in when I need to. @ipapasa, beware that Reed's suggestion is pretty much @John Saunders suggestion. Play before you port the goods.

kenny 2010-03-10 03:10:46

@Lirik: Data parallelism becomes much simpler due to new constructs like the parallel class, but even more so due to automatic handling of many aggregation scenarios (either via PLINQ or the local state parallel loop constructs), and also due to the introduction of many new supporting concurrent data classes.

Reed Copsey 2010-03-10 16:03:06

John, you're scaring me! But then again, you have a point there. By saying that the the code is well structured, I imply that certain parts, algorithms, could be rewritten from scratch perhaps.

ileon 2010-03-10 02:48:53

@ipapasa: I mean to scare you. I've debugged stuff like this, and maybe you haven't. This is the kind of stuff to give you nightmares. This stuff is difficult when you _do_ design it from scratch. I've seen _operating system_ code screw this up! All due respect, but I don't think you're likely to do that well.

John Saunders 2010-03-10 03:12:10

@John: I appreciate what you say and, to be honest, I take it under serious consideration.

ileon 2010-03-10 03:56:32

@John: I find this incredibly short sided. It's no longer acceptable to ignore parallelism - it should be a tool in every professional programmer's toolbox, and most serial code has some opportunity to exploit parallelism without causing issues. Scientific code, in general, is highly data centric, and often quite straightforward to work with using data paralellism techniques. I agree that care must be taken, but feel that it SHOULD be done.

Reed Copsey 2010-03-10 16:01:34

@Reed: I didn't say to ignore parallelism - I said he should write his code with parallelism in mind, not morph sequential code into parallel code using a developer base not accustomed to concurrency issues.

John Saunders 2010-03-10 16:27:30

@John: THe thing is - at some point, you're going to have to start trying to exploit parallelism in code that was written serially. Sometimes this requires a rewrite, but very frequently, there are (at least small) gains to be made without a full redesign or rewrite.

Reed Copsey 2010-03-10 19:02:36

@Reed: that may apply to developers experienced with concurrency issues. That does not appear to apply to the OP. I strongly suspect that if they try to parallelize this existing, familiar code, they will be lulled into missing what they need to do differently. If instead, they focus on learning how to create high-quality parallel algorithms, they will then be able to take their familiarity with the existing code and apply it to a solid parallel foundation.

John Saunders 2010-03-10 19:47:14

@John, @Reed: your contribution is valuable. What I understand from this conversation is that (a) it's not an easy task; lack of knowledge may lead to nightmares, as John said (so we need a strong reinforcement), (b) scientific code can benefit from data parallelism and we should go after it (Reed), (c) if we want to delve into the concurrency issues, obviously we should change the way we think and write our code (John).

ileon 2010-03-12 16:35:45

ansaurus

tags:

views:

answers:

Best practices to turn a .NET class library into a multithreaded .NET class library

related questions