views:

194

answers:

2

Assuming I have a cluster of n erlang nodes, some of which may be on my LAN, while others may be connected using a WAN (i.e. via the internet), what are suitable mechanisms to cater for a) different bandwidth availability/behavior (e.g. latency induced) and b) nodes with differing computational power (or even memory constraints for that matter)?

In other words, how do I prioritize local nodes that have lots of computational power, over those that have a high latency and may be less powerful, or how would I ideally prioritize high performance remote nodes with high transmission latencies to specifically do those processes with a relatively huge computations/transmission (i.e. completed work per message ,per time unit) ratio?

I am mostly thinking in terms of basically benchmarking each node in a cluster by sending them a benchmark process to run during initialization, so that the latencies involved in messasing can be calculated, as well as the overall computation speed (i.e. using a node-specific timer to determine how fast a node terminates with any task).

Probably, something like that would have to be done repeatedly, on the one hand in order to get representative data (i.e. averaging data) and on the other hand it might possibly even be useful at runtime in order to be able to dynamically adjust to changing runtime conditions.

(In the same sense, one would probably want to prioritize locally running nodes over those running on other machines)

This would be meant to hopefully optimize internal job dispatch so that specific nodes handle specific jobs.

Thanks

+1  A: 

The problem you are talking about has been tackled in many different ways in the context of Grid computing (e.g, see Condor). To discuss this more thoroughly, I think some additional information is required (homogeneity of the problems to be solved, degree of control over the nodes [i.e. is there unexpected external load etc.?]).

Implementing an adaptive job dispatcher will usually require to also adjust the frequency with which you probe the available resources (otherwise the overhead due to probing could exceed the performance gains).

Ideally, you might be able to use benchmark tests to come up with an empirical (statistical) model that allows you to predict the computational hardness of a given problem (requires good domain knowledge and problem features that have a high impact on execution speed and are simple to extract), and another one to predict communication overhead. Using both in combination should make it possible to implement a simple dispatcher that bases its decisions on the predictive models and improves them by taking into account actual execution times as feedback/reward (e.g., via reinforcement learning).

__roland__
+1  A: 

We've done something similar to this, on our internal LAN/WAN only (WAN being for instance San Francisco to London). The problem boiled down to a combination of these factors:

  1. The overhead in simply making a remote call over a local (internal) call
  2. The network latency to the node (as a function of the request/result payload)
  3. The performance of the remote node
  4. The compute power needed to execute the function
  5. Whether batching of calls provides any performance improvement if there was a shared "static" data set.

For 1. we assumed no overhead (it was negligible compared to the others)

For 2. we actively measured it using probe messages to measure round trip time, and we collated information from actual calls made

For 3. we measured it on the node and had them broadcast that information (this changed depending on the load current active on the node)

For 4 and 5. we worked it out empirically for the given batch

Then the caller solved to get the minimum solution for a batch of calls (in our case pricing a whole bunch of derivatives) and fired them off to the nodes in batches.

We got much better utilization of our calculation "grid" using this technique but it was quite a bit of effort. We had the added advantage that the grid was only used by this environment so we had a lot more control. Adding in an internet mix (variable latency) and other users of the grid (variable performance) would only increase the complexity with possible diminishing returns...

Alan Moore
Thanks for your response, the technique you used is pretty much in line with what I have been envisioning (and what I sketched out in the question). I think it would be interesting to see exactly this sort of scenario become supported by some form of erlang infrastructure (e.g. using OTP). I have accepted your answer because it really is very close to my scenario.
none