views:

52

answers:

2

i'm using

 data=urllib2.urlopen(url).read()

i want to know:

  1. how to know a url is gzipped

  2. dose urllib2 will automaticly uncompress the gzipped data if a url is gzip,so the data is always a string?

+3  A: 

This checks if the content is gzipped and decompresses it:

from StringIO import StringIO
import gzip

response = urllib2.urlopen(request)
if response.info().get('Content-Encoding') == 'gzip':
    buf = StringIO( response.read())
    f = gzip.GzipFile(fileobj=buf)
    data = f.read()
ars
A: 

If you are talking about a simple .gz file, no, urllib2 will not decode it, you will get the unchanged .gz file as output.

If you are talking about automatic HTTP-level compression using Content-Encoding: gzip or deflate, then that has to be deliberately requested by the client using an Accept-Encoding header.

urllib2 doesn't set this header, so the response it gets back will not be compressed. You can safely fetch the resource without having to worry about compression (though since compression isn't supported the request may take longer).

bobince