tags:

views:

121

answers:

2

Hello,

I'm using the following code to extract a tar file:

import tarfile
tar = tarfile.open("sample.tar.gz")
tar.extractall()
tar.close()

However, I'd like to keep tabs on the progress in the form of which files are being extracted at the moment. How can I do this?

EXTRA BONUS POINTS: is it possible to create a percentage of the extraction process as well? I'd like to use that for tkinter to update a progress bar. Thanks!

+2  A: 

You could use extract instead of extractall - you would be able to print the member names as they are being extracted. To get a list of members, you could use getmembers.

A textual progressbar library can be found here:

Tkinter snippet:

The MYYN
Looking at the code "extractall" calls "extract", so there should be no speed penalization.
tokland
thanks, removed my uneducated 'guess' ...
The MYYN
+1  A: 
def on_progress(filename, position, total_size):
    print "%s: %d of %s" %(filename, position, total_size)

class MyFileObject(tarfile.ExFileObject):
  def read(self, size, *args):
      on_progress(self.name, self.position, self.size)
      return tarfile.ExFileObject.read(self, size, *args)

tarfile.TarFile.fileobject = MyFileObject

Check the tarfile.py module code for more details, the standard library is pretty well written and less scary than you would think.

Edit1: removed monkey-patching (or is this still monkey-patching?), it turns out you can set your own file object.

Edit2: To get an overall byte progress, use the the fileobj argument:

total_size = os.path.getsize("a.tgz")

class MyFileObj(file):
    def read(self, size):
        print "%d of %d" %(self.tell(), total_size)
        return file.read(self, size)

tar = tarfile.open(fileobj=MyFileObj("a.tgz"))
tar.extractall()
tar.close()
tokland
This is still monkeypatching. `:)`
Mike Graham
Thanks tokland, this works :) Any way of getting a float of the overall extraction process?
FLX
To be more specific, is there a way of getting the uncompressed size before starting the extraction process?
FLX
@Mike: is this considered to be monkeypatching? I assumed that tarfile.TarFile being a "public" class (no _underscore) of the module, and fileobject a "public" class attribute (again, no underscore), you can play safely with them. But I am not really familiar with Python policy on this regard.
tokland
@FLX. I am afraid that using the code above you cannot get the total percentage with byte granularity. You could have two progress bars: the overall progress (file granularity) and the current file progress (byte granularity).
tokland
@FLX. I edited the answer to add the overall progress code. I think it now covers it all.
tokland
@tokland, `TarFile.fileobject` is an typically-fixed piece of global state you modify to change the behavior of it where you use it in your code (and end up modifying it for everyone else `=p`). If not monkeypatching, this is something close to it. The underscore convention is not the primary means to have internal attributes in Python, it is *documentation*. I doubt the decision to name it `fileobject` was because the implementor thought, "Oh, what a nice API for someone to replace this for their needs". If it was, I seriously doubt their object oriented design skills.
Mike Graham
@Mike, yeah, that sounds reasonable. I'd just go with the code that creates a custom file object so as not tweak the tarfile module.
tokland