ansaurus

Question

Python decompressing gzip chunk-by-chunk

Answer 1

+2 A:

gzip and zlib use slightly different headers.

See http://stackoverflow.com/questions/1838699/how-can-i-decompress-a-gzip-stream-with-zlib

Try d = zlib.decompressobj(16+zlib.MAX_WBITS).

And you might try changing your chunk size to a power of 2 (say CHUNKSIZE=1024) for possible performance reasons.

wisty 2010-03-11 11:40:57

That did it perfectly. Thanks.(Now, why isn't this hint in the python docs?)

2010-03-11 14:30:23

zlib is just a wrapper around the c version of zlib. It's not well documented at all. Mind you, the 16+zlib.MAX_WBITS isn't documented the c version either, and it's not the first time I've seen an undocumented zlib feature.

wisty 2010-03-12 17:33:56

Answer 2

A:

The gzip module is used for handling gzip files:

f = gzip.GzipFile('23046-8.txt.gz', 'rb')
buffer = f.read(CHUNKSIZE)

while buffer:
  print(buffer)
  buffer = f.read(CHUNKSIZE)

Ignacio Vazquez-Abrams 2010-03-11 12:56:53

ansaurus

tags:

views:

answers:

Python decompressing gzip chunk-by-chunk

related questions