tags:

views:

122

answers:

2

The below python code takes a list of files and zips them up. The only File Geodatabase (File based database) that I need to have is called "Data" so how can I modify the loop to only include the File based database called Data? To be more specific a File Geodatabase is stored as a system folder that contains binary files that store and manage spatial data. So I need the entire system folder called Data.gdb.

Many Thanks

#**********************************************************************
# Description:
#    Zips the contents of a folder, file geodatabase or ArcInfo workspace
#    containing coverages into a zip file.
# Parameters:
#   0 - Input workspace
#   1 - Output zip file. It is assumed that the caller (such as the
#       script tool) added the .zip extension.
#
#**********************************************************************

# Import modules and create the geoprocessor
import sys, zipfile, arcgisscripting, os, traceback
gp = arcgisscripting.create()

# Function for zipping files 
def zipws(path, zip):
    isdir = os.path.isdir

    # Check the contents of the workspace, if it the current
    # item is a directory, gets its contents and write them to
    # the zip file, otherwise write the current file item to the
    # zip file
    #
    for each in os.listdir(path):
        fullname = path + "/" + each
        if not isdir(fullname):
            # If the workspace is a file geodatabase, avoid writing out lock
            # files as they are unnecessary
            #
            if not each.endswith('.lock'):
                # gp.AddMessage("Adding " + each + " ...")
                # Write out the file and give it a relative archive path
                #
                try: zip.write(fullname, each)
                except IOError: None # Ignore any errors in writing file
        else:
            # Branch for sub-directories
            #
            for eachfile in os.listdir(fullname):
                if not isdir(eachfile):
                    if not each.endswith('.lock'):
                        # gp.AddMessage("Adding " + eachfile + " ...")
                        # Write out the file and give it a relative archive path
                        #
                        try: zip.write(fullname + "/" + eachfile, \
                                       os.path.basename(fullname) + "/" + eachfile)
                        except IOError: None # Ignore any errors in writing file


if __name__ == '__main__':
    try:
        # Get the tool parameter values
        #
        inworkspace = sys.argv[1]
        outfile = sys.argv[2]     

        # Create the zip file for writing compressed data
        #
        zip = zipfile.ZipFile(outfile, 'w', zipfile.ZIP_DEFLATED)
        zipws(inworkspace, zip)
        zip.close()

        # Set the output derived parameter value for models
        #
        gp.setparameterastext(1, outfile)
        gp.AddMessage("Zip file created successfully")

    except:
        # Return any python specific errors as well as any errors from the geoprocessor
        #
        tb = sys.exc_info()[2]
        tbinfo = traceback.format_tb(tb)[0]
        pymsg = "PYTHON ERRORS:\nTraceback Info:\n" + tbinfo + "\nError Info:\n    " + \
                str(sys.exc_type)+ ": " + str(sys.exc_value) + "\n"
        gp.AddError(pymsg)

        msgs = "GP ERRORS:\n" + gp.GetMessages(2) + "\n"
        gp.AddError(msgs)
+1  A: 

The best way to walk over a directory tree is os.walk -- does the file/dir separation for you, and also does the recursion down to subdirectories for you.

So:

def zipws(path, zip, filename='Data.gdb'):
  for root, dirs, files in os.walk(path):
    if filename in files:
      zip.write(os.path.join(root, filename),
                os.path.join(os.path.basename(root), filename))
      return

I'm not certain I've captured the entire logic with which you want to determine the two arguments to zip.write (it's not obvious to me from your code), but, if not, that should be easy to adjust.

Also, I'm not sure if you want that return at the end: the effect is zipping only one file named that way, as opposed to zipping all files named that way that may occur in the tree (in their respective subdirectories). If you know there's only one such file, may as well leave the return in (it will just speed things up a bit). If you want all such files when there's more than one, remove the return.

Edit: turns out that the "one thing" the OP wants is a directory, not a file. In that case, I would suggest, as the simplest solution:

def zipws(path, zip, dirname='Data.gdb'):
  for root, dirs, files in os.walk(path):
    if os.path.basename(root) != dirname: continue
    for filename in files:
      zip.write(os.path.join(root, filename),
                os.path.join(dirname, filename))
    return

again with a similar guess wrt the total mystery of what exactly it is that you want to use for your archive-name.

Alex Martelli
Hi Alex- I updated my question above so as to provide more detail on Data.gdb (it is a system folder with multiple binary files). Also I commented out my entire zipws function and added your code and also updated zipws(inworkspace, zip) to be zipws(inworkspace, zip, filename) but when I ran it I get a syntax error. Thoughts?
Josh
@Josh, re the latter, I just typoed away a closed-paren -- editing to fix. If the one thing you're looking for is a directory, not a file, simplest IMHO is to look for the basename of root -- since I'm editing anyway I'll show the new version.
Alex Martelli
Nice job Alex. That worked.
Josh
@Josh, always glad to help!
Alex Martelli
A: 

Start at this line:

    zipws(inworkspace, zip)

You don't want to use this this function to build the zip file from multiple files. It appears you want to build a zip file with just one member.

Replace it with this.

     try: 
         zip.write(os.path.join('.', 'Data.gdb'))
     except IOError: 
         pass # Ignore any errors in writing file

Throw away the zipws function which you -- apparently -- don't want to use.

Read this, it may help: http://docs.python.org/library/zipfile.html

S.Lott
the Data.gdb is multiple files as it is a File based database. So will that be an issue?
Josh
@Josh: You should **update** your question to be very specific on this point. It's not clear what this name means. Is it a "directory"? Or is it a "file"? Please actually update your question to provide the missing information.
S.Lott
updated, thanks
Josh