views:

100

answers:

5

In C++, I've written a mathematical program (for diffusion limited aggregation) where each new point calculated is dependent on all of the preceding points. Is it possible to have such a program work in a parallel or distributed manner to increase computing speed? If so, what type of modifications to the code would I need to look into?

EDIT: My source code is available at... http://www.bitbucket.org/damigu78/brownian-motion/downloads/ filename is DLA_full3D.cpp I don't mind significant re-writes if that's what it would take. After all, I want to learn how to do it.

+3  A: 

If your algorithm is fundamentally sequential, you can't make it fundamentally not that.

What is the algorithm you are using?

EDIT: Googling "diffusion limited aggregation algorithm parallel" lead me here, with the following quote:

DLA, on the other hand, has been shown [9,10] to belong to the class of inherently sequential or, more formally, P-complete problems. Therefore, it is unlikely that DLA clusters can be sampled in parallel in polylog time when restricted to a number of processors polynomial in the system size.

So the answer to your question is "all signs point to no".

MSN
A: 

You mention that each step depends on the results of all preceding steps, which makes it hard to parallelize such a program.

I don't know which algorithm you are using, but you could use multithreading for speedup. Each thread would process one step, but must wait for results that haven't yet been calculated (though it can work with the already calculated results if they don't change values over time). That essentially means you would have to use a locking/waiting mechanism in order to wait for results that haven't yet been calculated but are currently needed by a certain worker thread to go on.

AndiDog
What about a comment, Mr. Downvote?
AndiDog
A: 

Probably. There are parallel versions of most sequential algorithms, and for those sequential algorithms which are not immediately parallelisable there are usually parallel substitutes. This looks like be one of those cases where you need to consider parallelisation or parallelisability before you choose an algorithm. But unless you tell us a bit (a lot ?) more about your algorithm we can't provide much specific guidance. If it amuses you to watch SOers argue in the absence of hard data sit back and watch, but if you want answers, edit your question.

The toxiclibs website gives some useful insight into how one DLA implementation is done

High Performance Mark
A: 

There is cilk, which is an enhancement to the C language (unfortunately not C++ (yet)) that allows you to add some extra information to your code. With just a few minor hints, the compiler can automatically parallelize parts of your code, such as running multiple iterations of a for loop in parallel instead of in series.

Adam Rosenfield
A: 

Without knowing more about your problem, I'll just say that this looks like a good candidate to implement as a parallel prefix scan (http://en.wikipedia.org/wiki/Prefix_sum). The simplest example of this is an array that you want to make a running sum out of:

1 5 3 2 5 6

becomes

1 6 9 11 16 22

This looks inherently serial (as all the points depend on the ones previous), but it can be done in parallel.

miked