views:

242

answers:

3

Hello,

I need to download a zip archive of text files, dispatch each text file in the archive to other handlers for processing, and finally write the unzipped text file to disk.

I have the following code. It uses multiple open/close on the same file, which does not seem elegant. How do I make it more elegant and efficient?

zipped = urllib.urlopen('www.abc.com/xyz.zip')
buf = cStringIO.StringIO(zipped.read())
zipped.close()
unzipped = zipfile.ZipFile(buf, 'r')
for f_info in unzipped.infolist():
   logfile = unzipped.open(f_info)
   handler1(logfile)
   logfile.close()   ## Cannot seek(0). The file like obj does not support seek()
   logfile = unzipped.open(f_info)
   handler2(logfile)
   logfile.close()
   unzipped.extract(f_info)
+1  A: 

You could say something like:

handler_dispatch(logfile)

and

def handler_dispatch(file):
   for line in file:
      handler1(line)
      handler2(line)

or even make it more dynamic by constructing a Handler class with multiple handlerN functions, and applying each of them inside handler_dispatch. Like

class Handler:
    def __init__(self:)
        self.handlers = []

  def add_handler(handler):
      self.handlers.append(handler)

  def handler_dispatch(self, file):
      for line in file:
          for handler in self.handlers:
              handler.handle(line)
danben
+1: If the handlers do not need access to the entire file at one time, then this is a very nice solution.
D.Shawley
Yes, I should have made that explicit. Thanks.
danben
That's an interesting idea, danben. But my handlers need to keep track of the states when processing the text files, so feeding one line at a time will not work without some modifications.
hli
+1  A: 

Open the zip file once, loop through all the names, extract the file for each name and process it, then write it to disk.

Like so:

for f_info in unzipped.info_list():
    file = unzipped.open(f_info)
    data = file.read()
    # If you need a file like object, wrap it in a cStringIO
    fobj = cStringIO.StringIO(data)
    handler1(fobj)
    handler2(fobj)
    with open(filename,"w") as fp:
        fp.write(data)

You get the idea

Bryan Ross
+2  A: 

Your answer is in your example code. Just use StringIO to buffer the logfile:

zipped = urllib.urlopen('www.abc.com/xyz.zip')
buf = cStringIO.StringIO(zipped.read())
zipped.close()
unzipped = zipfile.ZipFile(buf, 'r')
for f_info in unzipped.infolist():
   logfile = unzipped.open(f_info)
   # Here's where we buffer:
   logbuffer = cStringIO.StringIO(logfile.read())
   logfile.close()

   for handler in [handler1, handler2]:
      handler(logbuffer)
      # StringIO objects support seek():
      logbuffer.seek(0)

   unzipped.extract(f_info)
David Eyk