tags:

views:

128

answers:

1

Hello,

In some Python unit tests of a program I'm working on we use in-memory zipfiles for end to end tests. In SetUp() we create a simple zip file, but in some tests we want to overwrite some archives. For this we do "zip.writestr(archive_name, zip.read(archive_name) + new_content)". Something like

import zipfile
from StringIO import StringIO

def Foo():
    zfile = StringIO()
    zip = zipfile.ZipFile(zfile, 'a')
    zip.writestr(
        "foo",
        "foo content")
    zip.writestr(
        "bar",
        "bar content")
    zip.writestr(
        "foo",
        zip.read("foo") +
        "some more foo content")
    print zip.read("bar")

Foo()

The problem is that this works fine in Python 2.4 and 2.5, but not 2.6. In Python 2.6 this fails on the print line with "BadZipfile: File name in directory "bar" and header "foo" differ."

It seems that it is reading the correct file bar, but that it thinks it should be reading foo instead.

I'm at a loss. What am I doing wrong? Is this not supported? I tried searching the web but could find no mention of similar problems. I read the zipfile documentation, but could not find anything (that I thought was) relevant, especially since I'm calling read() with the filename string.

Any ideas?

Thank you in advance!

+1  A: 

The PKZIP file is highly structured and merely appending to the end will screw that up. I can't speak to earlier versions working, but the workaround to this problem is to open a zipfile for reading, open a new one for writing, extract the contents of the first and then add your addition components at the end. When completed replace the original zipfile with the newly created one.

The traceback I get when running your code when running your code is:

Traceback (most recent call last):
  File "zip.py", line 19, in <module>
    Foo()
  File "zip.py", line 17, in Foo
    print zip.read("bar")
  File "/usr/lib/python2.6/zipfile.py", line 834, in read
    return self.open(name, "r", pwd).read()
  File "/usr/lib/python2.6/zipfile.py", line 874, in open
    zinfo.orig_filename, fname)
zipfile.BadZipfile: File name in directory "bar" and header "foo" differ.

Upon closer inspection, I notice that you are reading from a file-like StringIO opened with 'a'ppend mode which should result in a read error since 'a' is not generally readable, and certainly must be seek()ed between reads and writes. I'm going to fool around some and update this.

Update:

Having stolen pretty much all of this code from Doug Hellmann's excellent Python Module of the Week, I find that it works pretty much as I expected. One cannot merely append to a structured PKZIP file, and if the code in the original post ever did work, it was by accident:

import zipfile
import datetime

def create(archive_name):
    print 'creating archive'
    zf = zipfile.ZipFile(archive_name, mode='w')
    try:
        zf.write('/etc/services', arcname='services')
    finally:
        zf.close()

def print_info(archive_name):
    zf = zipfile.ZipFile(archive_name)
    for info in zf.infolist():
        print info.filename
        print '\tComment:\t', info.comment
        print '\tModified:\t', datetime.datetime(*info.date_time)
        print '\tSystem:\t\t', info.create_system, '(0 = Windows, 3 = Unix)'
        print '\tZIP version:\t', info.create_version
        print '\tCompressed:\t', info.compress_size, 'bytes'
        print '\tUncompressed:\t', info.file_size, 'bytes'
        print
    zf.close()

def append(archive_name):
    print 'appending archive'
    zf = zipfile.ZipFile(archive_name, mode='a')
    try:
        zf.write('/etc/hosts', arcname='hosts')
    finally:
        zf.close()

def expand_hosts(archive_name):
    print 'expanding hosts'
    zf = zipfile.ZipFile(archive_name, mode='r')
    try:
        host_contents = zf.read('hosts')
    finally:
        zf.close

    zf =  zipfile.ZipFile(archive_name, mode='a')
    try:
        zf.writestr('hosts', host_contents + '\n# hi mom!')
    finally:
        zf.close()

def main():
    archive = 'zipfile.zip'
    create(archive)
    print_info(archive)
    append(archive)
    print_info(archive)
    expand_hosts(archive)
    print_info(archive)

if __name__ == '__main__': main()

Notable is the output from the last call to print_info:

...
hosts
    Modified:   2010-05-20 03:40:24
    Compressed: 404 bytes
    Uncompressed:   404 bytes

hosts
    Modified:   2010-05-27 11:46:28
    Compressed: 414 bytes
    Uncompressed:   414 bytes

It did not append to the existing arcname 'hosts', it created an additional archive member.

"Je n'ai fait celle-ci plus longue que parce que je n'ai pas eu le loisir de la faire plus courte."
- Blaise Pascal

msw