Simple Distributed Computation (similar to summation) (in C++)

I'm looking for a framework / approach to do message passing distributed computation in C++.

I've currently got an iterative, single-threaded algorithm that incrementally updates some data model. The updates are literally additive, and I'd like to distribute (or at least parallelize) the computation hereof over as many machines+cores as possible. The data model can be viewed as a big array of (independent) floating point values.

Since the updates are all additive (i.e. commutative and associative), it's OK to merge in updates from other nodes in arbitrary order or even to batch merge updates. When it comes to applying updates, the map/reduce paradigm would work fine.

On the other hand, the updates are computed with respect to the current model state. Each step "corrects" some flaw, so it's important that the model used for computing the update is as fresh as possible (the more out of date the model, the less useful the update). Worst case, the updates are fully dependent, and parallelism doesn't do any good.

I've never implemented anything flexibly distributable, but this looks like a prime candidate. So, I'm looking for some framework or approach to distribute the updates (which consist mostly of floating point numbers and a few indexes into the array to pinpoint where to add the update). But, I'm unsure as to how:

I can broadcast updates to all connected processes. But that means massive network traffic, so I'd realistically need to batch updates; and then updates will be less current. This doesn't look scalable anyhow.
I can do some kind of ring topology. Basically, a machine sends the next machine the sum of its own updates and those of it's predecessors. But then I'd need to figure out how to not duplicate updates, after all, the ring is circular and eventually it's own updates will arrive as part of the sum of its predecessors.
or some kind of tree structure...

To recap, to get decent convergence performance, low latency is critical; the longer between update computation and update application, the less useful the update is. Updates need to be distributed to all nodes as quickly as possible; but because of the commutative and associate nature of the updates, it doesn't matter whether these updates are individually broadcast (probably inefficient) or arrive as part of a merged batch.

Does anybody know of any existing frameworks or approaches to speed up development? Or even just general pointers? I've never done anything quite like this...

ansaurus

tags:

views:

answers:

Simple Distributed Computation (similar to summation) (in C++)

related questions