views:

157

answers:

7

I'm currently writing a large multi threaded C++ program (> 50K LOC).

As such I've been motivated to read up alot on various techniques for handling multi-threaded code. One theory I've found to be quite cool is:

http://en.wikipedia.org/wiki/Communicating_sequential_processes

And it's invented by a slightly famous guy, who's made other non-trivial contributions to concurrent programming.

However, is CSP used in practice? Can anyone point to any large application written in a CSP style?

Thanks!

A: 

This style is ubiquitous on Unix where many tools are designed to process from standard in to standard out. I don't have any first hand knowledge of large systems that are build that way, but I've seen many small once-off systems that are

for instance this simple command line uses (at least) 3 processes.

cat list-1 list-2 list-3 | sort | uniq > final.list
John Knoeller
Yes, but the pipeline is not between processes, not between threads like the OP is asking about.
Omnifarious
The paper he links to is about processes.
John Knoeller
CSP as a theory doesn't care about the distinction between threads and programs. The question as stated still remains valid: is CSP theory used in practice, in multi-threaded programs?
MSalters
A: 

This system is only moderately sized, but I wrote a protocol processor that strips away and interprets successive layers of protocol in a message that used a style very similar to this. It was an event driven system using something akin to cooperative threading, but I could've used multithreading fairly easily with a couple of added tweaks.

The program is proprietary (unfortunately) so I can't show off the source code.

In my opinion, this style is useful for some things, but usually best mixed with some other techniques. Often there is a core part of your program that represents a processing bottleneck, and applying various concurrency increasing techniques there is likely to yield the biggest gains.

Omnifarious
A: 

Microsoft had a technology called ActiveMovie (if I remember correctly) that did sequential processing on audio and video streams. Data got passed from one filter to another to go from input to output format (and source/sink). Maybe that's a practical example??

DougN
not really, one thread is used to push data all the way through the chain of filters.
John Knoeller
+1  A: 

Yes and no. The basic idea of CSP is used quite a bit. For example, thread-safe queues in one form or another are frequently used as the primary (often only) communication mechanism to build a pipeline out of individual processes (threads).

Hoare being Hoare, however, there's quite a bit more to his original theory than that. He invented a notation for talking about the processes, defined a specific set of signals that can be sent between the processes, and so on. The notation has since been refined in various ways, quite a bit of work put into proving various aspects, and so on.

Application of that relatively formal model of CSP (as opposed to just the general idea) is much less common. It's been used in a few systems where high reliability was considered extremely important, but few programmers appear interested in learning (yet another) formal design notation.

When I've designed systems like this, I've generally used an approach that's less rigorous, but (at least to me) rather easier to understand: a fairly simple diagram, with boxes representing the processes, and arrows representing the lines of communication. I doubt I could really offer much in the way of a proof about most of the designs (and I'll admit I haven't designed a really huge system this way), but it's worked reasonably well nonetheless.

Jerry Coffin
A: 

I think that you will find that a few companies use this style of thinking in daily, weekly or monthly data massaging.

Data comes in various formats, spreadsheets, flat lists, scraped web pages, xml whatever. This data has to be further beaten and perhaps transformed before it can be sorted, filtered and merged into central database.

Now you could write a single multiple threaded program that does all of this but managing the start and stopping of all of the thread tasks, tends to require more work and more attention when running, daily weekly data processing as they often go wrong due to threads not waiting before they run etc.

By streaming in a kind of pipe the thread execution sequence less effort is required to manage the threads and less errors occur at thread time.

Downside is that jobs normally takes longer. But throw more/faster hardware at it.

Threads on the surface look wonderful but con currency it a bitch to manage.

Suggest you learn a non-deterministic language like Prolog. Once you know how to program in this style. Threads are not so hard.

+3  A: 

CSP, as a process calculus, is fundamentally a theoretical thing that enables us to formalize and study some aspects of a parallel program.

If you instead want a theory that enables you to build distributed programs, then you should take a look to parallel structured programming.

Parallel structural programming is the base of the current HPC (high-performance computing) research and provides to you a methodology about how to approach and design parallel programs (essentially, flowcharts of communicating computing nodes) and runtime systems to implements them.

A central idea in parallel structured programming is that of algorithmic skeleton, developed initially by Murray Cole. A skeleton is a thing like a parallel design pattern with a cost model associated and (usually) a run-time system that supports it. A skeleton models, study and supports a class of parallel algorithms that have a certain "shape".

As a notable example, mapreduce (made popular by Google) is just a kind of skeleton that address data parallelism, where a computation can be described by a map phase (apply a function f to all elements that compose the input data), and a reduce phase (take all the transformed items and "combine" them using an associative operator +).

I found the idea of parallel structured programming both theoretical sound and practical useful, so I'll suggest to give a look to it.

A word about multi-threading: since skeletons addresses massive parallelism, usually they are implemented in distributed memory instead of shared. Intel has developed a tool, TBB, which address multi-threading and (partially) follows the parallel structured programming framework. It is a C++ library, so probably you can just start using it in your projects.

akappa
A: 

The Wikipedia article looks to me like a lot of funny symbols used to represent somewhat pedestrian concepts. For very large or extensible programs, the formalism can be very important to check how the (sub)processes are allowed to interact.

For a 50,000 line class program, you're probably better off architecting it as you see fit.

In general, following ideas such as these is a good idea in terms of performance. Persistent threads that process data in stages will tend not to contend, and exploit data locality well. Also, it is easy to throttle the threads to avoid data piling up as a fast stage feeds a slow stage: just block the fast one if its output buffer grows too big.

Potatoswatter