ansaurus

Question

Answer 1

+2 A:

I ran the command you specified. It downloaded a gzip-ed data into index.html. I renamed index.html into index.html.gz. I tried gzip -d inedx.html.gz which lead to an error: gzip: index.html.gz: unexpected end of file.

Second try was zcat index.html.gz which worked fine except that after the </html> tag it printed the same error as above.

$ zcat index.html.gz
...
  </td>
 </tr>
</table>


</body>
</html>
gzip: index.html.gz: unexpected end of file

The server is faulty.

Notinlist 2010-09-06 14:27:29

Answer 2

+3 A:

It appears you can call readline() on the gzip.GzipFile object, but read() raises a struct.error because the file ends abruptly.

Since readline works (except at the very end), you could do something like this:

import urllib2
import StringIO
import gzip
import struct

url = "http://www.v-gn.de/wbb/"
request = urllib2.Request(url)
request.add_header('Accept-encoding', 'gzip')
response = urllib2.urlopen(request)
content = response.read()
response.close()
fh=StringIO.StringIO(content)
html = gzip.GzipFile(fileobj=StringIO.StringIO(content))
try:
    for line in html:
        line=line.rstrip()
        print(line)
except struct.error:
    pass

unutbu 2010-09-06 14:53:25

ansaurus

tags:

views:

answers:

What's wrong with this gzip format?

related questions