views:

46

answers:

1

I am trying to automate a downloading process. In this I want to know, whether a particular file's save is completed or not. The scenario is like this.

  1. Open a site address using either Chrome or Firefox (any browser)
  2. Save the page to disk using 'Crtl + S' (I work on windows)
  3. Now if the page is very big, then it takes few seconds to save. I want to parse the html once the save is complete.

Since I don't have control on the browser save functionality, I don't know whether the save has completed or not.

One idea I thought, is to get the md5sum of the file using a while loop, and check against the previous one calculated, and continue the while loop till the md5 sum from the previous and current one matches. This doesn't works I guess, as it seems browser first attempts to save the file in a tmp file and then copies the content to the specified file (or just renames the file).

Any ideas? I use python for the automation, hence any idea which can be implemented using python is welcome.

Thanks Indrajith

+2  A: 

On Windows you can try to open file in exclusive access mode to check if it's being used (read or written) by some other program. I've used this to wait for complete FTP uploads server-side, here's the code:

def check_file_ready(self, path):
    '''Check if file is not opened by another process.'''
    handle = None
    try:
        handle = win32file.CreateFile(
            path,
            win32file.GENERIC_WRITE,
            0,
            None,
            win32file.OPEN_EXISTING,
            win32file.FILE_ATTRIBUTE_NORMAL,
            None)
        return True
    except pywintypes.error, e:
        if e[0] == winerror.ERROR_SHARING_VIOLATION:
            # Note: other possible error codes include
            #  winerror.ERROR_FILE_NOT_FOUND
            #  winerror.ERROR_PATH_NOT_FOUND
            #  winerror.ERROR_ACCESS_DENIED.
            return False
        raise
    finally:
        if handle:
            win32file.CloseHandle(handle)

Note: this functions re-raises all win32 errors except sharing violation. You should check for file existence beforehead or check for additional error codes in the function (see comment on line 15).

Nikita Nemkin