Using python 2.4 and the built-in ZipFile library, I cannot read very large zip files (greater than 1 or 2 GB) because it wants to store the entire contents of the uncompressed file in memory. Is there another way to do this (either with a third-party library or some other hack), or must I "shell out" and unzip it that way (which isn't as cross-platform, obviously).
+1
A:
Have a look at http://stackoverflow.com/questions/297345/create-a-zip-file-from-a-generator-in-python which discusses a similar probem.
Harley
2008-12-03 23:07:34
Thanks but unfortunately they just discuss zipping a file, not unzipping. If you look at the source code in the zipfile.py library, it uses zlib to decompress a file into a string, which is what's using all the memory.
Marc Novakowski
2008-12-03 23:23:01
+11
A:
Here's an outline of decompression of large files.
import zipfile
import zlib
import os
src = open( doc, "rb" )
zf = zipfile.ZipFile( src )
for m in zf.infolist():
# Examine the header
print m.filename, m.header_offset, m.compress_size, repr(m.extra), repr(m.comment)
src.seek( m.header_offset )
src.read( 30 ) # Good to use struct to unpack this.
nm= src.read( len(m.filename) )
if len(m.extra) > 0: ex= src.read( len(m.extra) )
if len(m.comment) > 0: cm= src.read( len(m.comment) )
# Build a decompression object
decomp= zlib.decompressobj(-15)
# This can be done with a loop reading blocks
out= open( m.filename, "wb" )
result= decomp.decompress( src.read( m.compress_size ) )
out.write( result )
result = decomp.flush()
out.write( result )
# end of the loop
out.close()
zf.close()
src.close()
S.Lott
2008-12-04 03:08:28