tags:

views:

449

answers:

2

Hi,

I am trying to unzip a gzipped file in Python using the gzip module. The pre-condition is that, I get 160 bytesof data at a time, and I need to unzip it before I request for the next 160 bytes. Partial unzipping is OK, before requesting the next 160 bytes. The code I have is

import gzip
import time
import StringIO

file = open('input_cp.gz', 'rb')
buf = file.read(160)
sio = StringIO.StringIO(buf)
f = gzip.GzipFile(fileobj=sio)
data = f.read()
print data

The error I am getting is IOError: CRC check failed. I am assuming this is cuz it expects the entire gzipped content to be present in buf, whereas I am reading in only 160 bytes at a time. Is there a workaround this??

Thanks

+3  A: 

Create your own class with a read() method (and whatever else GzipFile needs from fileobj, like close and seek) and pass it to GzipFile. Something like:

class MyBuffer(object):
  def __init__(self, input_file):
    self.input_file = input_file

  def read(self, size=-1):
    if size < 0:
      size = 160
    return self.input_file.read(min(160, size))

Then use it like:

file = open('input_cp.gz', 'rb')
mybuf = MyBuffer(file)
f = gzip.GzipFile(fileobj=mybuf)
data = f.read()
fserb
No errors this time, but a blank line was o/p to console. Pretty sure 160b are enough to unzip. Also tried w/ 2000bclass MyBuffer(object): def __init__(self, input_file): self.input_file = input_file def read(self, size=-1): if size<0: size = 160 return self.input_file.read(min(160,size)) def tell(self): return def seek(self, start, end): return def close(self): returnfile = open('input_cp.gz', 'rb')mybuf = MyBuffer(file)f = gzip.GzipFile(fileobj=mybuf)data = f.read()print data
A: 

Made the changes. No errors this time, but a blank line was o/p to console. Pretty sure 160bytes are enough to unzip. Also tried w/ 2000bytes. Same blank line.

class MyBuffer(object):

def __init__(self, input_file):
    self.input_file = input_file

def read(self, size=-1):
    if size<0:
        size = 160
    return self.input_file.read(min(160,size))

def tell(self):
    return

def seek(self, start, end):
    return

def close(self):
    return

file = open('input_cp.gz', 'rb')

mybuf = MyBuffer(file)

f = gzip.GzipFile(fileobj=mybuf)

data = f.read()

print data

you probably need to implement meaningful versions of seek and tell.
fserb
@fserb. I dont think that's the problem. The CRC check appears right at the end. I guess it reads the eof for this CRC. Pasting the correct error.....File "gunzip.py", line 27, in ? data = f.read() File "/usr/local/lib/python2.4/gzip.py", line 218, in read self._read(readsize) File "/usr/local/lib/python2.4/gzip.py", line 273, in _read self._read_eof() File "/usr/local/lib/python2.4/gzip.py", line 309, in _read_eof raise IOError, "CRC check failed"IOError: CRC check failed