views:

35

answers:

2

Parallel computing is used more and more, and new framework features and shortcuts make it easier to use (for example Parallel extensions which are directly available in .NET 4).

Now what about the parallelism across network? I mean, an abstraction of everything related to communications, creation of processes on remote machines, etc. Something like, in C#:

NetworkParallel.ForEach(myEnumerable, () =>
{
    // Computing and/or access to web ressource or local network database here
});

I understand that it is very different from the multi-core parallelism. The two most obvious differences would probably be:

  • The fact that such parallel task will be limited to computing, without being able for example to use files stored locally (but why not a database?), or even to use local variables, because it would be rather two distinct applications than two threads of the same application,
  • The very specific implementation, requiring not just a separate thread (which is quite easy), but spanning a process on different machines, then communicating with them over local network.

Despite those differences, such parallelism is quite possible, even without speaking about distributed architecture.

Do you think it will be implemented in a few years? Do you agree that it enables developers to easily develop extremely powerfull stuff with much less pain?

Example:
Think about a business application which extracts data from the database, transforms it, and displays statistics. Let's say this application takes ten seconds to load data, twenty seconds to transform data and ten seconds to build charts on a single machine in a company, using all the CPU, whereas ten other machines are used at 5% of CPU most of the time. In a such case, every action may be done in parallel, resulting in probably six to ten seconds for overall process instead of forty.

+3  A: 

This is typically handled in a different manner than in-process concurrency. The issues which arise due to architecture are much greater, and the lack of shared memory causes other concerns to arise.

That being said, "parallelism across network" has been in use for a very long time. The most common option is to use Message Passing Interface (MPI). There is even a C# library for this, MPI.NET.

Now, the goal of "completely abstracting away" the work of partitioning and calling out across the network is not done (though MPI does handle many of these tasks in a relatively-straightforward manner). I doubt this will happen soon, either, since there are many new concerns that arise when you lose shared memory. However, I suspect that some projects such as Axum will eventually lead to a very highly abstracted means of accomplishing this, but I also suspect that this will be quite a few years out, since in-process, shared memory concurrency is just now becoming more common and mainstream.

Reed Copsey
+1 for bringing up distributed memory parallel computing with MPI. The names "Parallel" LINQ and Task "Parallel" Library have always bothered me because when I think of parallel computing I think of MPI and distributed memory parallel computing.
Taylor Leese
MainMa
+1  A: 

Now what about the parallelism across network? I mean, an abstraction of everything related to communications, creation of processes on remote machines, etc.

It has been tried many times before, and such abstractions usually fail as they embody the fallacies of distributed computing. The chances of a network failure sometime in a calculation are far higher than normal hardware failure, so you need to use fault and latency tolerant patterns of communication, rather than relying on procedural idioms.

Pete Kirkham
The news that distributed computing is a fallacy will come as a surprise to all those owners of clusters, grids and supercomputers. How you will laugh.
High Performance Mark
@High Performance Mark 'fallicies of X' does not imply X is a fallacy. HPC clusters expend a great deal of effort reducing latency and increasing reliability of their local networks, and the successful grid protocols always report failures to the application rather than pretending they can't happen. Beowulf grid middleware also uses check-pointing for reliability - being able to restart a job after a failure. None of these systems abstracts away the network.
Pete Kirkham