tags:

views:

80

answers:

5

I have two zip files, both of them open well with Windows Explorer and 7-zip.

However when i open them with Python's zipfile module [ zipfile.ZipFile("filex.zip") ], one of them gets opened but the other one gives error "BadZipfile: File is not a zip file".

I've made sure that the latter one is a valid Zip File by opening it with 7-Zip and looking at its properties (says 7Zip.ZIP). When I open the file with a text editor, the first two characters are "PK", showing that it is indeed a zip file.

I'm using Python 2.5 and really don't have any clue how to go about for this. I've tried it both with Windows as well as Ubuntu and problem exists on both platforms.

Update: Traceback from Python 2.5.4 on Windows:

Traceback (most recent call last):
File "<module1>", line 5, in <module>
    zipfile.ZipFile("c:/temp/test.zip")
File "C:\Python25\lib\zipfile.py", line 346, in init
    self._GetContents()
File "C:\Python25\lib\zipfile.py", line 366, in _GetContents
    self._RealGetContents()
File "C:\Python25\lib\zipfile.py", line 378, in _RealGetContents
    raise BadZipfile, "File is not a zip file"
BadZipfile: File is not a zip file

Basically when the _EndRecData function is called for getting data from End of Central Directory" record, the comment length checkout fails [ endrec[7] == len(comment) ].

The values of locals in the _EndRecData function are as following:

END_BLOCK: 4096, comment: '\x00', data: '\xd6\xf6\x03\x00\x88,N8?, start: 4073

+1  A: 

try to run unix file command on both of your files. may be it will give you some clue

zed_0xff
For both files it says: Zip archive data, at least v2.0 to extract
sharjeel
bad news. I hoped it will say something different. Does all your files gets uncompressed by 7zip w/o any errors? Are they both can be uncompressed with unix' `unzip` command as well? Did you updated your python libzip bindings to latest version?
zed_0xff
Yes, both files get uncompressed by 7-zip as well as unzip without any errors.I haven't tried updating the libzip bindings to latest version. How do I do that?
sharjeel
I was running Python 2.5.5. I copied zipfile.py from 2.6.5 and tried opening file with it. That worked!
sharjeel
On second thought it just worked for once; maybe I checked it on wrong file and didn't observe it in excitement :(
sharjeel
+1  A: 

Are you the one that compressed both files? I'd recommend recompressing the "bad" file with normal compression method without any fancy options that Python might not recognize.

Raladan
I tried uncompressing the file and re-compressing it with Windows Explorer. It did work in this case. However this isn't a proper solution for me as I intend to run my program where external users will be submitting zip files and my Python program will process them.
sharjeel
A: 

Along with file on Unix/Linux as zed says, have a look at what mimetype.guess_type(filename) says about the offending file, and all other functions in mimetype have to say about that file.

vpit3833
For both files it says "application/zip"
sharjeel
Are you able to check the md5sum or such a thing at the place of zip file creation and at your place of reading the file with your program? Are the two md5sums identical? If you expect similar issues, try to ask the sender to send you the md5 of their zip file along with the zip file.
vpit3833
+1  A: 

Show the full traceback that you got from Python -- this may give a hint as to what the specific problem is. Unanswered: What software produced the bad file, and on what platform?

Update: Traceback indicates having problem detecting the "End of Central Directory" record in the file -- see function _EndRecData starting at line 128 of C:\Python25\Lib\zipfile.py

Suggestions:
(1) Trace through the above function
(2) Try it on the latest Python
(3) Answer the question above.
(4) Read this and anything else found by google("BadZipfile: File is not a zip file") that appears to be relevant

John Machin
Windows, Python 2.5.2:Traceback (most recent call last): File "<module1>", line 5, in <module> zipfile.ZipFile("c:/temp/test.zip") File "C:\Python25\lib\zipfile.py", line 346, in __init__ self._GetContents() File "C:\Python25\lib\zipfile.py", line 366, in _GetContents self._RealGetContents() File "C:\Python25\lib\zipfile.py", line 378, in _RealGetContents raise BadZipfile, "File is not a zip file"BadZipfile: File is not a zip file
sharjeel
Here's the formatted version of traceback: http://dpaste.de/X0Pb/
sharjeel
Thanks for the link. I've already gone through it but that didn't help.Tested on Python 2.5.4, 2.6.5 on Windows and Python 2.5.2 on Ubuntu 64-bit.
sharjeel
A: 

Have you tried a newer python, or if that is too much trouble, simply a newer zipfile.py? I have successfully used a copy of zipfile.py from Python 2.6.2 (latest at the time) with Python 2.5 in order to open some zip files that weren't supported by Py2.5s zipfile module.

Baffe Boyois
Yes I've tried it with 2.6.5 as well. Problem persists :(
sharjeel