views:

333

answers:

4

I have some C# class libraries, that were designed without taking into account things like concurrency, multiple threads, locks, etc...

The code is very well structured, it is easily expandable, but it can benefit a lot from multithreading: it's set of scientific/engineering libraries that need to perform billions of calculations in very-very short time (and now they don't take benefit from the available cores).

I want to transform all this code into a set of multithreaded libraries, but I don't know where to start and I don't have any previous experience.

I could use any available help, and any recommendations/suggestions.

A: 

The best place to start is probably http://msdn.microsoft.com/en-us/concurrency/default.aspx

Good luck!

Diego Mijelshon
+4  A: 

I'd highly recommend looking into .NET 4 and the Task Parallel Library (also available in .NET 3.5sp1 via the Rx Framework).

It makes many concurrency issues much simple, in particular, data parallelism becomes dramatically simpler. Since you're dealing with large datasets in most scientific/engineering libraries, data parallelism is often the way to go...

For some reference material, especially on data parallelism and background about decomposing and approaching the problem, you might want to read my blog series on Parallelism in .NET 4.

Reed Copsey
Thank you Reed.
ileon
@Reed Copsey: "data parallelism becomes dramatically simpler." Can you be more specific? I feel like only the execution of parallel tasks has become simpler... do you say that data parallelism becomes simpler because you can execute parallel tasks for each item in your collection, because of PLINQ or because there is something that makes the reduction of contention simpler?
Lirik
+1. I haven't tried the 4 task/threading stuff yet, but I listened to a podcast and a lot of effort has been put into making it easier, both from a tools/IDE point of view as well as the API abstraction. I'm looking forward to digging in when I need to. @ipapasa, beware that Reed's suggestion is pretty much @John Saunders suggestion. Play before you port the goods.
kenny
@Lirik: Data parallelism becomes much simpler due to new constructs like the parallel class, but even more so due to automatic handling of many aggregation scenarios (either via PLINQ or the local state parallel loop constructs), and also due to the introduction of many new supporting concurrent data classes.
Reed Copsey
+6  A: 

My recommendation would be to not do it. You didn't write that code to be used in parallel, so it's not going to work, and it's going to fail in ways that will be difficult to debug.

Instead, I recommend you decide ahead of time which part of that code can benefit the most from parallelism, and then rewrite that code, from scratch, to be parallel. You can take advantage of having the unmodified code in front of you, and can also take advantage of existing automated tests.

It's possible that using the .NET 4.0 Task Parallel Library will make the job easier, but it's not going to completely bridge the gap between code that was not designed to be parallel and code that is.

John Saunders
John, you're scaring me! But then again, you have a point there. By saying that the the code is well structured, I imply that certain parts, algorithms, could be rewritten from scratch perhaps.
ileon
@ipapasa: I mean to scare you. I've debugged stuff like this, and maybe you haven't. This is the kind of stuff to give you nightmares. This stuff is difficult when you _do_ design it from scratch. I've seen _operating system_ code screw this up! All due respect, but I don't think you're likely to do that well.
John Saunders
@John: I appreciate what you say and, to be honest, I take it under serious consideration.
ileon
@John: I find this incredibly short sided. It's no longer acceptable to ignore parallelism - it should be a tool in every professional programmer's toolbox, and most serial code has some opportunity to exploit parallelism without causing issues. Scientific code, in general, is highly data centric, and often quite straightforward to work with using data paralellism techniques. I agree that care must be taken, but feel that it SHOULD be done.
Reed Copsey
@Reed: I didn't say to ignore parallelism - I said he should write his code with parallelism in mind, not morph sequential code into parallel code using a developer base not accustomed to concurrency issues.
John Saunders
@John: THe thing is - at some point, you're going to have to start trying to exploit parallelism in code that was written serially. Sometimes this requires a rewrite, but very frequently, there are (at least small) gains to be made without a full redesign or rewrite.
Reed Copsey
@Reed: that may apply to developers experienced with concurrency issues. That does not appear to apply to the OP. I strongly suspect that if they try to parallelize this existing, familiar code, they will be lulled into missing what they need to do differently. If instead, they focus on learning how to create high-quality parallel algorithms, they will then be able to take their familiarity with the existing code and apply it to a solid parallel foundation.
John Saunders
@John, @Reed: your contribution is valuable. What I understand from this conversation is that (a) it's not an easy task; lack of knowledge may lead to nightmares, as John said (so we need a strong reinforcement), (b) scientific code can benefit from data parallelism and we should go after it (Reed), (c) if we want to delve into the concurrency issues, obviously we should change the way we think and write our code (John).
ileon
+1  A: 

If you don't have any previous experience in multithreading then I would recommend that you get the basics first by looking at the various resources: http://stackoverflow.com/questions/540242/book-or-resource-on-c-concurrency

Making your entire library multithreaded requires a brand new architectural approach. If you simply go around and start putting locks everywhere in your code you'll end up making your code very cumbersome and you might not even achieve any performance increases.

The best concurrent software is lock-free and wait-free... this is difficult to achieve in C# (.NET) since most of your Collections are not lock-free, wait-free or even thread-safe. There are various discussions on lock-free data structures. A lot of people have referenced Boyet's articles (which are REALLY good) and some people have been throwing around The Task Parallel Library as the next thing in .NET concurrency, but TPL really doesn't give you much in terms of thread-safe collections.

.NET 4.0 is coming out with Collections.Concurrent which should help a lot.

Making your entire library concurrent would not be recommended since it wasn't designed with concurrency in mind from the start. Your next option is to go through your library and identify which portions of it are actually good candidates for multithreading, then you can pick the best concurrency solution for them and implement it. The main thing to remember is that when you write multithreaded code, the concurrency should result in increased throughput of your program. If increased throughput is not achieved (i.e. you either match or the throughput is less than in the sequential version), then you should simply not use concurrency in that code.

Lirik
Thank you Lirik.
ileon