views:

47

answers:

3

Let me explain the tree structure: I have a network directory where several times a day new .txt files are copied by our database. Those files sit on directory based on usernames. On the local disk I have the same structure (directory based on usernames) and need to be updated with the latest .txt files. It's not a sync procedure: I copy the remote file to a local destination and I don't care what happens with it after that, so I don't need to keep it in sync. However I do need to copy ONLY the new files and not those that I already copied. It would look something like:

Remote disk

/mnt/remote/database
+ user1/
+ user2/
+ user3/
+ user4/

Local disk

/var/database
+ user1/
+ user2/
+ user3/
+ user4/

I played with

find /mnt/remote/database/ -type f -mtime +1

and other variants, but it's not working very well.

So, the script i am trying to figure is the following:
1- check /mnt/remote/database recursively for *.txt
2- check the files date to see if they are new (since the last time I checked, maybe maintain a text file with the last time checked on it as a reference?)
3- if the file is new, copy it to the proper destination in /var/database (so /mnt/remote/database/user1/somefile.txt will be copied to /var/database/user1/)

I'll run the script through a cron job.

I'm doing this in C right now, but the IT people are not very good in debugging or writing C and if they need to add or fix something they can handle bash scripts better, which I am not very good at.

Any ideas out there?

thank you!

+3  A: 

you could consider using local rsync between the input & output directories. it has all the options you want to make its sync policy very flexible.

dweeves
ok, could you give me an example? thank you
Jessica
rsync -urlc --include="*.txt" /mnt/remote/database/ /var/database/ for more information on rsync check:http://samba.anu.edu.au/ftp/rsync/rsync.html
dweeves
thank you, let me check it.
Jessica
hmm, i can't seem to be able to find how to make it copy ONLY the files that are newer that a certain date and those ONLY
Jessica
in fact, the rsync will only copy files that are newer than the last call.if you need to have a specific period excluded, you could use a find command like the one in the response below (using -mtime +xx) pipe it to the rsync adding the option --exclude_from=-
dweeves
+1  A: 
find /mnt/remote/database/ -type f -newer $TIMESTAMP_FILE | xargs $CP_COMMAND
touch $TIMESTAMP_FILE
Randy Proctor
the first part I understand. What's in $CP_COMMAND? thanks
Jessica
I would make `$CP_COMMAND` a script with its first command doing the `touch` in order to reduce race conditions.
Dennis Williamson
I think this is a little too blunt to be used to avoid race conditions...rsync is the real answer.
Randy Proctor
OK, the scripts in http://www.movingtofreedom.org/2007/04/15/bash-shell-script-copy-only-files-modifed-after-specified-date/ are working great, now I need to figure out how to recurse the directories, grab the name of the directory and pass it to the copy script...
Jessica