ansaurus

Question

Python: Creating a streaming gzip'd file-like?

Answer 1

+1 A:

The gzip module supports compressing to a file-like object, pass a fileobj parameter to GzipFile, as well as a filename. The filename you pass in doesn't need to exist, but the gzip header has a filename field which needs to be filled out.

2010-02-03 14:35:26

Mmmm… I hadn't noticed that… But I'm not sure it will work: either the `fileobj` must be a gzip'd input stream, or an output stream which the gzip'd data will be written to. So, better than nothing, but still not quite what I'd like.

David Wolever 2010-02-03 14:41:10

Answer 2

+1 A:

Use the cStringIO (or StringIO) module in conjunction with zlib:

>>> import zlib
>>> from cStringIO import StringIO
>>> s.write(zlib.compress("I'm a lumberjack"))
>>> s.seek(0)
>>> zlib.decompress(s.read())
"I'm a lumberjack"

jcdyer 2010-02-03 14:49:08

The problem with this, though, is that the entire input stream must be loaded into memory (when it's passed to `zlib.compress`) and then must be loaded into memory *again* when it is returned from `zlib.decompress`.

David Wolever 2010-02-03 15:05:32

It never leaves memory, if you use StringIO. You said in your question that you wanted a "file-like object", which is common python terminology for an object that has similar methods to a file object. It doesn't say anything about whether it lives on disk or not. But then you also suggested that you didn't want a gz file. Can you please be more clear about what you are really looking for?

jcdyer 2010-02-03 15:49:49

Err, sorry - yes, that is my fault. In my mind "file-like object" implies "something intended to be processed in chunks", but I guess that's a faulty assumption. I have updated the question.

David Wolever 2010-02-03 16:05:44

have you looked at `zlib.compressobj()` and `zlib.decompressobj()`? Perfect for chunking.

jcdyer 2010-02-03 17:29:48

Yup, I have. As I mentioned (albeit not very clearly), they work, but their interface isn't very standard, and it could depend on my getting things like buffer sizes correct.

David Wolever 2010-02-03 18:01:56

Answer 3

+3 A:

It's quite kludgy (self referencing, etc; just put a few minutes writing it, nothing really elegant), but it does what you want if you're still interested in using gzip instead of zlib directly.

Basically, GzipWrap is a (very limited) file-like object that produces a gzipped file out of a given iterable (e.g., a file-like object, a list of strings, any generator...)

Of course, it produces binary so there was no sense in implementing "readline".

You should be able to expand it to cover other cases or to be used as an iterable object itself.

from gzip import GzipFile

class GzipWrap(object):
    # input is a filelike object that feeds the input
    def __init__(self, input, filename = None):
        self.input = input
        self.buffer = ''
        self.zipper = GzipFile(filename, mode = 'wb', fileobj = self)

    def read(self, size=-1):
        if (size < 0) or len(self.buffer) < size:
            for s in self.input:
                self.zipper.write(s)
                if size > 0 and len(self.buffer) >= size:
                    self.zipper.flush()
                    break
            else:
                self.zipper.close()
            if size < 0:
                ret = self.buffer
                self.buffer = ''
        else:
            ret, self.buffer = self.buffer[:size], self.buffer[size:]
        return ret

    def flush(self):
        pass

    def write(self, data):
        self.buffer += data

    def close(self):
        self.input.close()

Heim 2010-02-03 16:29:39

haha very smart - passing `self` to the GzipFile. I like it!

David Wolever 2010-02-03 16:34:32

(ok, so I see your point that it's not particularly elegant to pass 'self' to the GzipFile… But I still think it's a neat hack).

David Wolever 2010-02-03 16:38:06

I've corrected a little bug in the code. When reading with size < 0, it didn't clear the buffer. I don't think you'll be using it like that, but a bug is a bug... O:)

Heim 2010-02-03 16:40:21

ansaurus

tags:

views:

answers:

Python: Creating a streaming gzip'd file-like?

related questions