views:

29

answers:

3

I want to be able to distribute bundles of files, about 500 MB per bundle, to all machines on a corporate "extranet" (which is basically a few LANs connected using various private mechanisms, including leased lines and VPN).

The total number of hosts is roughly 100, and the goal is to get a copy of the bundle from one host onto all the other hosts reliably, quickly, and efficiently. One important issue is that some hosts are grouped together on single fast LANs in which case the network I/O should be done once from one group to the next and then within each group between all the peers. This is as opposed to a strict central server system where multiple hosts might each fetch the same bundle over a slow link, rather than once via the slow link and then between each other quickly.

A new bundle will be produced every few days, and occasionally old bundles will be deleted (but that problem can be solved separately).

The machines in question happen to run recent Linuxes, but bonus points will go to solutions which are at least somewhat cross-platform (in which case the bundle might differ per platform but maybe the same mechanism can be used).

That's pretty much it. I'm not opposed to writing some code to handle this, but it would be preferable if it were one of bash, Python, Ruby, Lua, C, or C++.

A: 

What about rsync?

compie
rsync is great in itself, it's not a complete solution: we want to copy once across each slower link and then many times within a LAN. rsync isn't smart enough to do that by itself--you could invoke rsync in a smart way, but ideally I'd like to have something that doesn't require a lot of hand-coding of paths between hosts and so on (else I'd just write a giant script with different cases for each host or whatever).
John Zwinck
A: 

I'm going to suggest you use compie's idea of rysnc to copy the files in which case you can use a scripting language of your choice.

On the propagating system you will need a script containing some form of representation of the hosts and a matrix between them weighted with the speed. You then need to calculate a minimum spanning tree from that information. From that, you can then send messages to the systems to which you intend to propagate detailing the MST and the bundle to fetch, whereby that script/daemon begins transfer. That host then contacts the hosts over the fastest links...

You could implement it in bash - python might be better or a custom C daemon.

When you update the network you'll need to update the matrix based on latest information.

See: Prim's Algorithm.

Ninefingers
How would you deal with host failures? One host on a given LAN going down shouldn't impact the overall system, and further, it'd be nice if that host somehow knew how to catch up when it came back online.
John Zwinck
Sounds like you need to effectively build a routing system - in which case, each host recalculates the MST based on which hosts it can ping and then contacts them to say have you got the packet? If not, upgrade. Have each host know its latest bundle so it can easily reply and have each host track what every other host has and who it received from.
Ninefingers
I can't help but think there must be some existing programs or systems that I can script, though--"build a routing system" probably takes more development effort than I'd like. At a minimum (no pun intended), I'd want the MST stuff to be done automatically by some existing program or system, so that it could just tell me an appropriate host. This is kind of like OSPF in the networking world, except OSPF is for one path whereas MST is for multiple paths (so MST is more appropriate, topologically speaking).
John Zwinck
A: 

I think all these problems have been solved by modern research into p2p networking and well packaged into nice forms. A bit of script and bit torrent should solve these problems. torrent clients exist for all modern OSs, then a script on each machine to check a location for a new torrent file, start the DL, then delete the old bundle once the DL has finished.

Adam Shiemke
BitTorrent is tempting, but how can privacy be ensured? The bundle of files is not meant for public consumption, yet some of the computers involved do have access to the internet. Do you know of a non-GUI BT program which is suitable for private use like this? I looked around and came up a bit short.
John Zwinck
Encrypt the bundle and make the torrent file only accessable to clients who authenticate (password-protected ftp or something). Intended clients won't share with external clients unless the external client has the torrent file. Then some simple encryption of the bundle that your script will unencrypted after DL is complete.You can get a headless BT client as a python script. Should be cross-platform. Lots of command BT clients exist for linux, not sure about windows. uTorrent might do what you need.
Adam Shiemke