views:

146

answers:

5

I find myself using python for a lot of file management scripts as the one below. While looking for examples on the net I am surprised about how little logging and exception handling is featured on the examples. Every time I write a new script my intention is not to end up as the one below but if it deals with files then no matter what my paranoia takes over and the end result is nothing like the examples I see on the net. As I am a newbie I would like to known if this is normal or not. If not then how do you deal with the unknowns and the fears of deleting valuable info?

def flatten_dir(dirname):
    '''Flattens a given root directory by moving all files from its sub-directories and nested 
    sub-directories into the root directory and then deletes all sub-directories and nested 
    sub-directories. Creates a backup directory preserving the original structure of the root
    directory and restores this in case of errors.
    '''
    RESTORE_BACKUP = False
    log.info('processing directory "%s"' % dirname)
    backup_dirname = str(uuid.uuid4())
    try:
        shutil.copytree(dirname, backup_dirname)
        log.debug('directory "%s" backed up as directory "%s"' % (dirname,backup_dirname))
    except shutil.Error:
        log.error('shutil.Error: Error while trying to back up the directory')
        sys.stderr.write('the program is terminating with an error\n')
        sys.stderr.write('press consult the log file\n')
        sys.stderr.flush()
        time.sleep(0.25)
        print 'Press any key to quit this program.'
        msvcrt.getch()
        sys.exit()

    for root, dirs, files in os.walk(dirname, topdown=False):
        log.debug('os.walk passing: (%s, %s, %s)' % (root, dirs, files))
        if root != dirname:
            for file in files:
                full_filename = os.path.join(root, file)
                try:
                    shutil.move(full_filename, dirname)
                    log.debug('"%s" copied to directory "%s"' % (file,dirname))
                except shutil.Error:
                    RESTORE_BACKUP = True
                    log.error('file "%s" could not be copied to directory "%s"' % (file,dirname))
                    log.error('flagging directory "%s" for reset' % dirname)
            if not RESTORE_BACKUP:
                try:
                    shutil.rmtree(root)
                    log.debug('directory "%s" deleted' % root)
                except shutil.Error:
                    RESTORE_BACKUP = True
                    log.error('directory "%s" could not be deleted' % root)
                    log.error('flagging directory "%s" for reset' % dirname)
        if RESTORE_BACKUP:
            break
    if RESTORE_BACKUP:
        RESTORE_FAIL = False
        try:
            shutil.rmtree(dirname)
        except shutil.Error:
            log.error('modified directory "%s" could not be deleted' % dirname)
            log.error('manual restoration from backup directory "%s" necessary' % backup_dirname)
            RESTORE_FAIL = True 
        if not RESTORE_FAIL:
            try:
                os.renames(backup_dirname, dirname)
                log.debug('back up of directory "%s" restored' % dirname)
                print '>'
                print '>******WARNING******'
                print '>There was an error while trying to flatten directory "%s"' % dirname
                print '>back up of directory "%s" restored' % dirname
                print '>******WARNING******'
                print '>'
            except WindowsError:
                log.error('backup directory "%s" could not be renamed to original directory name' % backup_dirname)
                log.error('manual renaming of backup directory "%s" to original directory name "%s" necessary' % (backup_dirname,dirname))
                print '>'
                print '>******WARNING******'
                print '>There was an error while trying to flatten directory "%s"' % dirname
                print '>back up of directory "%s" was NOT restored successfully' % dirname
                print '>no information is lost'
                print '>check the log file for information on manually restoring the directory'
                print '>******WARNING******'
                print '>'
    else:
        try:
            shutil.rmtree(backup_dirname)
            log.debug('back up of directory "%s" deleted' % dirname)
            log.info('directory "%s" successfully processed' % dirname)
            print '>directory "%s" successfully processed' % dirname
        except shutil.Error:
            log.error('backup directory "%s" could not be deleted' % backup_dirname)
            log.error('manual deletion of backup directory "%s" necessary' % backup_dirname)
            print '>'
            print '>******WARNING******'
            print '>directory "%s" successfully processed' % dirname
            print '>cleanup of backup directory "%s" failed' % backup_dirname
            print '>manual cleanup necessary'
            print '>******WARNING******'
            print '>'
+1  A: 

That seems reasonable to me. It depends how important your data is.

I often start out like that, and have the logging be optional, with a flag set at the top of the file (or by the caller) setting the logging on or off. You could also have a verbosity.

Generally, after something has been working for a while and is no longer in development, I stop reading the logs, and build up giant log files which I never read. However, if something does go wrong, it's good to know that they're there.

Alex JL
That last paragraph happens to me a lot, too!
+3  A: 

It's ok to be a little paranoid. But there are different kinds of paranoia :). During the development phase, I use a lot of debug statements so I can see where I am going wrong (if I do go wrong). Sometimes I will leave these statements in, but use a flag to control whether they need to be displayed or not (pretty much a debug flag). You could also have a "verbosity" flag to control how much logging you do.

The other type of paranoia comes with sanity checks. This paranoia comes into play when you rely on external data or tools - pretty much anything that doesn't come out of your program. In this case, it never hurts to be paranoid (especially with data that you receive - never trust it).

It's also ok to be paranoid if you're checking to see if a particular operation completed successfully. This is just part of normal error-handling. I notice that you're performing functions like deleting directories and files. These are operations that could potentially fail, and so you must deal with the scenario where they fail. If you simply ignore it, your code could end up in an indeterminate/undefined state and could potentially do bad (or at the least, undesirable) things.

As far as the log files and debug files are concerned, you can leave them in if you wish. I usually perform a decent amount of logging; just enough to tell me what's going on. Of course, that is subjective. The key is to make sure you don't drown yourself in logging; where there is so much information that you can't pick it out easily. Logging in general helps you figure out what when wrong when a script you wrote suddenly stops working. Instead of stepping through the program to figure it out, you can get a rough idea of where the problem is by going through your logs.

Vivin Paliath
+1 for the verbose flag
RQDQ
using a global `DEBUG_LEVEL = 0b001` or however many levels you need is a great way to do it. Then you just have a `debug(message, level): if level < DEBUG_LEVEL: print (message)` is a great way to do it. Then instead of `print mytestVal` you can type `debug(mytestVal, 0b010)` - of course you can also define constants like `TESTING`, etc. at the beginning of your file... And of course you can also mask instead of doing > comparisons...
Wayne Werner
+6  A: 

Learning to let go (or how I learned to live with the bomb)...

Ask yourself this: what exactly are you afraid of, and how will you handle it if it happens? In the example that you provide you want to avoid data-loss. The way that you've handled it is by looking for every combination of conditions that you think is an error and placing huge amounts of logging over it. Things will still go wrong and it's not clear that having a large amount of logging will be a good way to deal with it. Sketching out what you are trying to achieve:

for each file in a tree
  if file is below the root
    move it into the root
if nothing went wrong
  delete empty subtrees

So what kind of things could go wrong in this process? Well, there are many ways in which the move file operations could barf due to the underlying file-system. Can we list them all and provide nice ways to deal with them? No... but in general you are going to deal with them all the same way. Sometimes an error is just an error regardless of what it is.

So in this case if any error occurs then you want to abort and undo any changes. The way that you've decided to do that is by creating a backup copy and restoring it when something goes wrong. But your most likely error is the file-system being full in which case these steps are likely to fail.... Ok, so it is a common enough problem - if you are worried about unknown errors at any point how do you stop your restoration path from going wrong?

The general answer is make sure that you do any intermediate work first, and then take a single troublesome (hopefully atomic) step. In your case you need to flip your recovery around. Instead of building a copy as a backup, build a copy of the result. If everything succeeds you can then swap the new result in for the old original tree. Or, if you are really paranoid you can leave that step for a human. The advantage here is that if something goes wrong you can just abort and throw away the partial state that you have constructed.

Your structure then becomes :

make empty result directory
for every file in the tree
  copy file into new result
on failure abort otherwise
  move result over old source directory

By the way, there is a bug in your current script that this psuedo-code makes more obvious: if you have files with identical names in different branches they will overwrite each other in the new flattened version.

The second point about this psuedo code is that all of the error handling is in the same place (ie wrap the make new directory and recursive copy inside a single try block and catch all the errors after it), this solves your original issue about the large ratio of logging / error-checking to actual work code.

backup_dirname = str(uuid.uuid4())
try:
    shutil.mkdir(backup_dirname)
    for root, dirs, files in os.walk(dirname, topdown=False):
        for file in files:
            full_filename = os.path.join(root, file)
            target_filename = os.path.join(backup_dirname,file)
            shutil.copy(full_filename, target_filename)
catch Exception, e:
    print >>sys.stderr, "Something went wrong %s" % e
    exit(-1)
shutil.move(back_dirname,root)      # I would do this bit by hand really
Amoss
+1 for re-framing the solution to make it cleaner.
Andrew Myers
Thinking about it this is clearly the most logical way. As all errors are dealt with the total restoration of the previous state your solution really makes the most sense in lumping together all steps liable to cause error into a single catching mechanism. It is even easier to implement a CRC check on the files for extra paranoia before deleting the original data!. On my code I was using shutil.move which raises an exception on files with equal names so no overwriting occurred. Here a check for equal filenames must be performed but it is trivial really.
LtPinback
+2  A: 

Paranoia can definitely obscure what your code is trying to do. That is a very bad thing, for several reasons. It hides bugs. It makes the program harder to modify when you need it to do something else. It makes it harder to debug.

Assuming Amoss can't cure you of your paranoia, here is how I might rewrite the program. Note that:

  • Each block of code that contains a lot of paranoia is split out into its own function.

  • Each time an exception is caught, it is re-raised, until it is finally caught in the main function. This eliminates the need for variables like RESTORE_BACKUP and RESTORE_FAIL.

  • The heart of the program (in flatten_dir) is now just 17 lines long and paranoia-free.


def backup_tree(dirname, backup_dirname):
    try:
        shutil.copytree(dirname, backup_dirname)
        log.debug('directory "%s" backed up as directory "%s"' % (dirname,backup_dirname))
    except:
        log.error('Error trying to back up the directory')
        raise

def move_file(full_filename, dirname):
    try:
        shutil.move(full_filename, dirname)
        log.debug('"%s" copied to directory "%s"' % (file,dirname))
    except:
        log.error('file "%s" could not be moved to directory "%s"' % (file,dirname))
        raise

def remove_empty_dir(dirname):
    try:
        os.rmdir(dirname)
        log.debug('directory "%s" deleted' % dirname)
    except:
        log.error('directory "%s" could not be deleted' % dirname)
        raise

def remove_tree_for_restore(dirname):
    try:
        shutil.rmtree(dirname)
    except:
        log.error('modified directory "%s" could not be deleted' % dirname)
        log.error('manual restoration from backup directory "%s" necessary' % backup_dirname)
        raise

def restore_backup(backup_dirname, dirname):
    try:
        os.renames(backup_dirname, dirname)
        log.debug('back up of directory "%s" restored' % dirname)
        print '>'
        print '>******WARNING******'
        print '>There was an error while trying to flatten directory "%s"' % dirname
        print '>back up of directory "%s" restored' % dirname
        print '>******WARNING******'
        print '>'
    except:
        log.error('backup directory "%s" could not be renamed to original directory name' % backup_dirname)
        log.error('manual renaming of backup directory "%s" to original directory name "%s" necessary' % (backup_dirname,dirname))
        print '>'
        print '>******WARNING******'
        print '>There was an error while trying to flatten directory "%s"' % dirname
        print '>back up of directory "%s" was NOT restored successfully' % dirname
        print '>no information is lost'
        print '>check the log file for information on manually restoring the directory'
        print '>******WARNING******'
        print '>'
        raise

def remove_backup_tree(backup_dirname):
    try:
        shutil.rmtree(backup_dirname)
        log.debug('back up of directory "%s" deleted' % dirname)
        log.info('directory "%s" successfully processed' % dirname)
        print '>directory "%s" successfully processed' % dirname
    except shutil.Error:
        log.error('backup directory "%s" could not be deleted' % backup_dirname)
        log.error('manual deletion of backup directory "%s" necessary' % backup_dirname)
        print '>'
        print '>******WARNING******'
        print '>directory "%s" successfully processed' % dirname
        print '>cleanup of backup directory "%s" failed' % backup_dirname
        print '>manual cleanup necessary'
        print '>******WARNING******'
        print '>'
        raise

def flatten_dir(dirname):
    '''Flattens a given root directory by moving all files from its sub-directories and nested 
    sub-directories into the root directory and then deletes all sub-directories and nested 
    sub-directories. Creates a backup directory preserving the original structure of the root
    directory and restores this in case of errors.
    '''
    log.info('processing directory "%s"' % dirname)
    backup_dirname = str(uuid.uuid4())
    backup_tree(dirname, backup_dirname)
    try:
        for root, dirs, files in os.walk(dirname, topdown=False):
            log.debug('os.walk passing: (%s, %s, %s)' % (root, dirs, files))
            if root != dirname:
                for file in files:
                    full_filename = os.path.join(root, file)
                    move_file(full_filename, dirname)
                remove_empty_dir(dirname)
    except:
        remove_tree_for_restore(dirname)
        restore_backup(backup_dirname, dirname)
        raise
    else:
        remove_backup_tree(backup_dirname)

def main(dirname):
    try:
        flatten_dir(dirname)
    except:
        import exceptions
        logging.exception('error flattening directory "%s"' % dirname)
        exceptions.print_exc()
        sys.stderr.write('the program is terminating with an error\n')
        sys.stderr.write('press consult the log file\n')
        sys.stderr.flush()
        time.sleep(0.25)
        print 'Press any key to quit this program.'
        msvcrt.getch()
        sys.exit()
Jason Orendorff
Amoss solution seems more fitting but using functions as a way to clear the core code mechanism of all the exceptions and logging is quite an useful technique which I will be implementing for sure in my code from now on.
LtPinback
A: 

If it's OK to leave the job half-done on error (only some files moved), as long as no files are lost, then the backup directory is unnecessary. So you can write dramatically simpler code:

import os, logging

def flatten_dir(dirname):
    for root, dirs, files in os.walk(dirname, topdown=False):
        assert len(dirs) == 0
        if root != dirname:
            for file in files:
                full_filename = os.path.join(root, file)
                target_filename = os.path.join(dirname, file)
                if os.path.exists(target_filename):
                    raise Exception('Unable to move file "%s" because "%s" already exists'
                                    % (full_filename, target_filename))
                os.rename(full_filename, target_filename)
            os.rmdir(root)

def main():
    try:
        flatten_dir(somedir)
    except:
        logging.exception('Failed to flatten directory "%s".' % somedir)
        print "ERROR: Failed to flatten directory. Check log files for details."

Each individual system call here makes progress without destroying data that you wanted to keep. There's no need for a backup directory because there's never anything you need to "recover".

Jason Orendorff