views:

326

answers:

6

Multithreading is hard. The only this you can do is program very carefully and follow good advice. One great advice I got from the answers on this forum is to avoid mutable state. I understand this is even enforced in the Erlang language. However, I fail to see how this can be done without a severe performance hit and huge amounts of caches.

For example. You have a big list of objects, each containing quite a lot of properties; in other words: a large datastructure. Suppose you have got a bunch of threads and they all need to access and modify the list. How can this be done without shared memory without having to cache the whole datastructure in each of the threads?

Update: After reading the reactions so far, I would like to put some more emphasis on performance. Don't you think that copying the same data around will make the program slower than with shared memory?

+1  A: 

I guess the first question is: why do they need to modify the list? Would it be possible for them to return their changes as a list of modifications rather than actually modifying the shared list? Could they work with a list which looks like it's a mutable version of the original list, but is actually only locally mutable? Are you changing which elements are in the list, or just the properties of those elements?

These are just questions rather than answers, but I'm trying to encourage you to think about your problem in a different way. Look at the bigger picture as the task you want to achieve instead of thinking about the way you'd tackle it in a normal imperative, mutable fashion. Changing the way you think about problems is very difficult, but you may find you get some great "aha!" moments :)

Jon Skeet
You are right, my example is a bit vague. I'll think about it and try to come up with a better example.
Dimitri C.
+1  A: 

There are many pitfalls when working with multiple threads and large sets of data. The advice to avoid mutable state is meant to try to make life easier for you if you can manage to follow the guideline (i.e. if you have no mutable state then multi-threading will be much easier).

If you have a large amount of data that does need to be modified then you perhaps cannot avoid mutable state. An alternative though would be to partition the data into blocks, each of which is passed to a thread for manipulation. The block can be processed and then passed back, and the controller can then perform the updates where necessary. In this scenario you have removed the mutable state from out of the the thread.

If this cannot be done and each thread needs update access to the full list (i.e. it could update any item on the list at any time) then you are going to have a lot of fun trying to make sure you have got your locking strategies and concurrency issues sorted. I'm sure there are scenarios where this is required, and the design pattern of avoiding mutable state may not apply.

samjudson
This is a nice answer. Thanks a lot.
Dimitri C.
+2  A: 

Not each algorithm can be parallelized in a successful manner.

If your program doesn't exhibit any "parallel structure", then you're pretty doomed to use locking and shared, mutable structures.

If your algorithm exhibit structure, then you can express your computation in terms of some patterns or formalism (for ex., a macro dataflow graph) that makes the choice of an immutable datastruct trivial.

So: think in term of the structure of the algorithm and just not in term of the properties of the datastructure to use.

akappa
+1  A: 

Just using immutable data-objects is a big help. Modifying lists sounds like a constructed argument, but consider granular methods that are unaware of lists.

Hugo
A constructed argument? :-) Certainly not. It often come in the situation I've got a large block of closely related data. Indeed, you might divide it in pieces, but I expect this to make the program code much longer and more complex.
Dimitri C.
+1  A: 

If you really need to update the structure one way to do this is have a single worker thread which picks up update requests from a fixed area prtected by a mutex.

If you are clever you can update the structure in place without affecting any "reading" threads (e.g. If you are adding to the end of an array you do all the work to add the new structure but only as the very last instruction do you increment the NoOfMembers count -- the reading threads should not see the new entry until you do this - or - arrange your data as an array of references to structures -- when you want to update a structure you copy the current member, update it, then as the last operation replace the reference in the array)

The other threads then only need to check a single simple "update in progess" mutex only when they activly want to update.

James Anderson
+2  A: 

You can get a great start in thinking about immutable collections, where they are applicable, how they can actually work without requiring lots of copying, etc. by looking through Eric Lippert's articles tagged with immutability:

http://blogs.msdn.com/ericlippert/archive/tags/Immutability/default.aspx

Daniel Earwicker
Or from the book he's porting the code from - http://books.google.co.uk/books?id=SxPzSTcTalAC
Pete Kirkham
Indeed, a great book!
Daniel Earwicker