views:

38

answers:

1

From hard experience I've found it useful to occasionally save the state of my long computations to disk to start them up later if something fails. Can I do this in a distributed computation package in R (like SNOW or multicore)?
It does not seem clear how this could be done since the master is collecting things from the slaves in a non-transparent way.

A: 

This is (again :-) a hard one.

You could try to dump snapshots on the nodes using save() or save.image(). You could then try to re-organize your code so that the nodes can resume after the last snapshot.

Or you could try to re-organize your workflow such that nodes 'take tickets' and return the results. That way the central node keeps tabs on everything and you can log interim results there.

Either way, what you desire is not available out of the box (as far as I know).

Dirk Eddelbuettel
Do you think if I transitioned to NWS I could dump the workspace every couple *large number* of iterations? Even though I'm running on multiple cores I could maybe count through the RN streams to retrieve the RNG state as well.
James
But if you 'phone home' from the nodes you get all that communications overhead. It's tough -- but ultimately your trade-off to make. And RNG state can be dumped easily. But eg in the 'ticketing' I mentioned you could provide the seed from the master for each task and then you'd control things....
Dirk Eddelbuettel

related questions