views:

223

answers:

3

I currently use following code to decompress gzipped response by urllib2:

opener = urllib2.build_opener()
response = opener.open(req)
data = response.read()
if response.headers.get('content-encoding', '') == 'gzip':
    data = StringIO.StringIO(data)
    gzipper = gzip.GzipFile(fileobj=data)
    html = gzipper.read()

Does it handle deflated response too or do I need to write seperate code to handle deflated response?

A: 

You need a slightly different approach, see this SO thread.

Alex Martelli
A: 

To answer from above comment, the HTTP spec (http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.3) says:

If no Accept-Encoding field is present in a request, the server MAY assume that the client will accept any content coding. In this case, if "identity" is one of the available content-codings, then the server SHOULD use the "identity" content-coding, unless it has additional information that a different content-coding is meaningful to the client.

I take that to mean it should use identity. I've never seen a server that doesn't.

Knio
It will work only if the server says "deflate" and delivers "zlib". "zlib" != "deflate". See the SO thread that Alex Martelli quotes.
John Machin
I did some more testing and you're right, it doesn't work on all servers. However there is no such thing as a "zlib" encoding, and deflate *is* the zlib algorithm, it just needs a proper header or something
Knio
A: 

There is a better way outlined at:

The author explains how to decompress chunk by chunk, rather than all at once in memory. This is the preferred method when larger files are involved.

Also found this helpful site for testing:

  • h__//carsten.codimi.de/gzip.yaws/
Gringo Suave