Why don't you just have an automated process of some kind (using cron, say) perform the syncing for you?
You can have a cron job monitoring a "Drop box" directory (or directories), and then it can run a script to perform the replication for you.
Or you can have the users submit the file with some meta data in order to better route the file once it's uploaded.
Simply, never let the users "choose" where it goes, rather have them tell you "what it's for" and then you have you scripts "know" where things go and how to get them there.
It's a fairly straight forward web app to do, even with just some perl CGI or whatever. And the back end plumbing is straightforward as well.
Answering comment...
If you have a web app performing the upload to CGI, then you typically don't even get "control" of the request until after the file has been fully uploaded. Kind of depends on what server side tech you use. In any case, it's easy to "know" with a web app when the file is fully uploaded. Then your sync process can rely solely on the meta-data to actually do the work on the file, and you don't create the meta-data until after you have moved the file in to the appropriate staging area, etc.
If you are simply using FTP or scp to copy up files in to staging directories, then the solution there is two have two processes. The first monitors the incoming directory, the second actually copies files.
The first process can simply look like this:
cd /your/upload/dir
ls -l > /tmp/newfiles
comm -12 /tmp/lastfiles /tmp/newfiles > /tmp/samefiles
filelist=`awk '{print $9}' /tmp/samefiles`
mv $filelist /your/copy/dir
mv /tmp/newfiles /tmp/lastfiles
This works like this:
- Grabs a list of the current files in
the incoming upload directory.
- Uses
comm(1) to get the files that have
not changed since the last time the
process was run.
- Uses awk(1) to get
the unchanged file names.
- Uses mv(1)
to move the files to your "staging"
directory.
- Finally, it takes the
current list of files, and makes it
the last list for the next run.
The magic here is comm(1). 'comm -12 filea fileb' gives you a file containing lines that are the same between the two files. If a new file is coming in, then its size will change as it is uploaded, so when you run 'ls -l' the next minute, it's line won't match the new line -- the size (minimally) will be different. So, comm will only find files who dates, filenames, and sizes have not changed. Once you have that list, the rest is pretty straightforward.
The only assumption that this process makes is simply that your filenames don't have spaces in them (thus awk will work easily to get the file name from the list). If you allow spaces, you'll need a slightly more clever mechanism to convert an 'ls -l' line in to the file name.
Also, the 'mv $filelist /your/copy/dir' assumes no spaces in the file names, so it too would need to be modified (you could roll it in to the awk script, having it make a system() call, perhaps).
The second process is also simple:
cd /your/copy/dir
for i in *
do
sync $i
mv $i /your/file/youve/copied/dir
done
Again, the "no spaces in filenames assumption" here. This process relies on a sync shell script that you've written that Does The Right Thing. That's left as an exercise for the reader.
Once synced, it moves the file to another directory. Any files that show up there have been "synced" properly. You could also simply delete the file, but I tend to not do that. I'd put that directory perhaps on the "delete files older than a week" program. This way if you encounter a problem, you still have the original files someplace that you can recover with.
This stuff is pretty simple, but it's also robust.
As long as the first process runs "slower" than the uploads (i.e. if you run it twice in a row, you're assured that the file size at least will change), then the run time can be every 1 minute, every hour, every day, whatever. At a minimum, it's safely restartable, and self recovering.
The dark side of the second process is if your sync process take longer than your schedule cron. If you run it every minute, and it takes more than one minute to run, you'll have two processes copying the same files.
If you sync process is "safe", you'll end up just copying the files twice...a waste, but usually harmless.
You can mitigate that by using a technique like this to ensure that your copy script doesn't run more than one at a time.
That's the meat of it. You can also use a combination (using a web app to upload with the meta data, and using the syncing process running automatically via cron).
You can also have a simple web page the lists all of the files in the /your/copy/dir so folks can see if their files have been synced yet. If the file is in this directory, it hasn't completed syncing yet.