tags:

views:

458

answers:

4

I have the following situation:

There is a windows folder that has been mounted on a Linux machine. There could be multiple folders (setup before hand) in this windows mount. I have to do something (preferably a script to start with) to watch these folders.

These are the steps: Watch for any incoming file(s). Make sure they are transferred completely. Move it to another folder. I do not have any control over the file transfer program on the windows machine. It is a secure FTP I believe. So I cannot ask that process to send me a trailer file to ensure the completion of file transfer.

I have written a bash script. I would like to know about any potential pitfalls with this approach. Reason is, there is a possibility of mulitple copies of this script running for multiple directories like this.

At the moment, there could be upto 100 directories that may have to be monitored.

Following is the script. I'm sorry for pasting a very long one here. Please take your time to review it and comment / criticize it. :-)

It takes 3 parameters, the folder that has to be watched, the folder where the file has to be moved, and a time interval, which has been explained below.

I'm sorry there seems to be a problem with the alignment. Markdown doesn't seem to like it. I tried to organize it properly, but not able to do so.

Linux servername 2.6.9-42.ELsmp #1 SMP Wed Jul 12 23:27:17 EDT 2006 i686 i686 i386 GNU/Linux

#!/bin/bash
log_this()
{
    message="$1"
    now=`date "+%D-%T"`
    echo $$": "$now ": " $message
}
usage()
{
    cat << EOF
Usage: $0 <Directory to be watched> <Directory to transfer> <time interval>
Time interval is the amount of time after which the modification time of a
file will be monitored. 
EOF
    `exit 1`
}

if [ $# -lt 2 ]
then
    usage
fi

WATCH_DIR=$1
APP_DIR=$2

if [ ! -d "$WATCH_DIR" ]
then
    log_this "FATAL: WATCH_DIR, $WATCH_DIR does not exist. Exiting"
    exit 1
fi

if [ ! -d "$APP_DIR" ]
then
    log_this "APP_DIR: $APP_DIR does not exist. Exiting"
    exit 1
fi


# This needs to be set after considering the rate of file transfer.
# Represents the seconds elapsed after the last modification to the file.
# If not supplied as parameter, defaults to 3.

seconds_between_mods=$3

if ! [[ "$seconds_between_mods" =~ ^[0-9]+$ ]]; then
        if [ ${#seconds_between_mods} -eq 0 ]; then
                log_this "No value supplied for elapse time. Defaulting to 3."
                seconds_between_mods=3
        else
                log_this "Invalid value provided for elapse time"
                exit 1
        fi
fi

log_this "Start Monitor."

while true
do
        ls -1 $WATCH_DIR | while read file_name
        do
            log_this "Start Monitoring for $file_name"

            # Refer only the modification with reference to the mount folder.
            # If there is a diff in time between servers, we are in trouble.

            token_file=$WATCH_DIR/foo.$$
            current_time=`touch $token_file && stat -c "%Y" $token_file`
            rm -f $token_file 2>/dev/null

            log_this "Current Time: $current_time"
            last_mod_time=`stat -c "%Y" $WATCH_DIR/$file_name`

            elapsed_time=`expr $current_time - $last_mod_time`
            log_this "Elapsed time ==> $elapsed_time"

            if [ $elapsed_time -ge $seconds_between_mods ]
            then
                    log_this "Moving $file_name to $APP_DIR"

                    # In case if there is no space left on the target mount, hide the     file
                    # in the mount itself and remove the incomplete file from APP_DIR.
                    mv $WATCH_DIR/$file_name $APP_DIR
                    if [ $? -ne 0 ]
                    then
                            log_this "FATAL: mv failed!! Hiding $file_name"
                            rm $APP_DIR/$file_name
                            mv $WATCH_DIR/$file_name $WATCH_DIR/.$file_name
                            log_this "Removed $APP_DIR/$file_name. Look for $WATCH_DIR/.$file_name and submit later."
                    fi

                    log_this "End Monitoring for $file_name"
            else
                    log_this "$file_name: Transfer seems to be in progress"
            fi
    done
    log_this "Nothing more to monitor."
    echo
    sleep 5
done
+1  A: 

To be honest a python app set up to run at start-up will do this quickly and efficiently. Python has amazing OS support and its rather complete.

Running the script will likely work, but it will be troublesome to take care and manage. I take it you will run these as frequent cron jobs?

Recursion
Hmmm. I do not have python knowledge. Can you point me to a python package or module (I'm not sure about the terminology) that can help in doing this? Thanks.
prabhu
1. Follow the python tutorial at www.python.org, 2. look at the os.path module. It has a walk() method which you can use to examine a directory tree.
Aaron Digulla
+1  A: 

I believe a much saner approach would be the use of a kernel-level filesystem notify item. Such as inotify. Get also the tools here.

lorenzog
inotify seems to be the proper way to do it. But at this point, I'm not in a position to patch our kernel to install it.
prabhu
+3  A: 

This isn't going to work for any length of time. In production, you will have network problems and other errors which can leave a partial file in the upload directory. I also don't like the idea of a "trailer" file. The usual approach is to upload the file under a temporary name and then rename it after the upload completes.

This way, you just have to list the directory, filter the temporary names out and and if there is anything left, use it.

If you can't make this change, then ask your boss for a written permission to implement something which can lead to arbitrary data corruption. This is for two purposes: 1) To make them understand that this is a real problem and not something which you make up and 2) to protect yourself when it breaks ... because it will and guess who'll get all the blame?

Aaron Digulla
A: 

your on stack overflow so you must have some programming experience. To get you off your feet here is a small app I wrote which takes a path and looks at the binary output of jpeg files. I never quite finished it, but it will get you started and to see the structure of python as well as some use of os.*. I wouldnt spend to much time worrying about my code.

 import time, os, sys

#analyze() takes in a path and moves into the output_files folder, to then analyze files

def analyze(path):
    list_outputfiles = os.listdir(path + "/output_files")
    print list_outputfiles
    for i in range(len(list_outputfiles)):
     #print list_outputfiles[i]
     f = open(list_outputfiles[i], 'r')
     f.readlines()

#txtmaker reads the media file and writes its binary contents to a text file.

def txtmaker(c_file): 
    print c_file
    os.system("cat" + " " + c_file + ">" + " " + c_file +".txt")
    os.system("mv *.txt output_files")

#parser() takes in the inputed path, reads and lists all files, creates a directory, then calls txtmaker.

def parser(path):
    os.chdir(path)
    os.mkdir(path + "/output_files", 0777)
    list_files = os.listdir(path)
    for i in range(len(list_files)):
     if os.path.isdir(list_files[i]) == True:
      print (list_files[i], "is a directory")
     else:
      txtmaker(list_files[i])
    analyze(path)

def main():
    path = raw_input("Enter the full path to the media: ")
    parser(path)


if __name__ == '__main__':

    main()
Recursion