views:

94

answers:

1

The other day I needed to archive a lot of data on our network and I was frustrated I had no immediate way to harness the power of multiple machines to speed-up the process.

I understand that creating a distributed job management system is a leap from a command-line archiving tool.

I'm now wondering what the simplest solution to this type of distributed performance scenario could be. Would a custom tool always be a requirement or are there ways to use standard utilities and somehow distribute their load transparently at a higher level?

Thanks for any suggestions.

+1  A: 

One way to tackle this might be to use a distributed make system to run scripts across networked hardware. This is (or used to be) an experimental feature of (some implementations of) GNU Make. Solaris implements a dmake utility for the same purpose.

Another, more heavyweight, approach might be to use Condor to distribute your archiving jobs. But I think you wouldn't install Condor just for the twice-yearly archiving runs, it's more of a system for regularly scavenging spare cycles from networked hardware.

The SCons build system, which is really a Python-based replacement for make, could probably be persuaded to hand work off across the network.

Then again, you could use scripts to ssh to start jobs on networked PCs.

So there are a few ways you could approach this without having to take up parallel programming with all the fun that that entails.

High Performance Mark
Thank you Mark, that was very thoughtful. You've offered a lot of avenues for me to explore! Condor looks particularly interesting, although you're correct in the limited schedule of these particular archiving needs may not merit it alone, perhaps there are other ways to apply it I'm about to realize! Thanks again.
barnaby