I use the following python code to download web pages from servers with gzip compression:
url = "http://www.v-gn.de/wbb/"
import urllib2
request = urllib2.Request(url)
request.add_header('Accept-encoding', 'gzip')
response = urllib2.urlopen(request)
content = response.read()
response.close()
import gzip
from StringIO import StringIO
html = gzip.GzipFile(fileobj=StringIO(content)).read()
This works generally, but for the specified URL fails with a struct.error
exception.
I get a similar result if I use wget with an "Accept-encoding" header. However, browsers seem to be able to decompress the response.
So my question is: is there a way I can get my python code to decompress the HTTP response without resorting to disabling compression by removing the "Accept-encoding" header?
For completeness, here's the line I use for wget:
wget --user-agent="Mozilla" --header="Accept-Encoding: gzip,deflate" http://www.v-gn.de/wbb/