I need to come up with a solution for complicated file transfer. I can do this, but I want to know if anybody knows of an open source solution that already does 90% of what I want to do.
The requirements are very odd. Don't try to understand them, they are a hellish mixture of politics, territory, and bureaucracy.
I control two servers, each which grab files from a group of upstream sources. I have some influence (but not total control) over the sources. My two servers collect these files and link new files into a processing directory (this is a bit simplified).
My two servers, let's call them A and B, now must send these files to a pair of servers downstream. I have almost no control over the downstream servers, let's call them X and Y.
- Files are uniquely identified by their filename. If it's got the same filename, it's the same file.
- There is a potentially endless flow of files. Their names contain a timestamp.
- Servers A and B (my servers) will typically get the same files. If file shows up on server A, it will 98% likely show up on server B with the same filename.
- A and B must push the files they receive to X and Y, using sftp or similiar. I am not allowed to install software on X and Y. I am not allowed a shell account, even a restricted one. Now it gets weird:
- Each file received by A and/or B must be copied ONCE by A or B (but not both) to EITHER X or Y but not both.
- The sources upstream from me may contain duplicate copies of the same file (this isn't a problem for me at the A/B servers, each of them can keep track of what they pull).
- Failures of A, B, X, or Y must be tolerated (as long as its partner is still active). The flow of files from ==> A/B ==> X/Y must not stop.
What gets me about all of this is the point that my local department would like the files duplicated between A and B, for safety sake, but the downstream receivers (a different department) insist that they want X and Y for failover ... but each file must only be copied to A or B, never both (or only in rare situations). If the downstream people would just manage duplicate files, it would be easy(er). Given that filenames quickly identify duplication, it's really not hard. Oh well, they don't want to do that. Even though a failure of X or Y would potentially lose some files. Go figure.
So I'm working on an algorithm to do all of this, and I've made some progress, but it's going to be a little complicated to deal with race conditions, failure of nodes, restart of nodes, the mostly-independent nature of A and B, etc. I'm going to be a little upset if after a month of effort a friend says "Why didn't you just use SuperOpenSourceSolution? You could have got it working in one day!"
So ... does anybody know of an out-of-the-box (or nearly so) solution? I know that there general MFT solutions out there but I haven't heard that they can do this sort of thing.
I've had a look at rsync but I can't see how it would handle the weird distribution.
Thanks.