views:

162

answers:

2

I'm using python to programatically download a zip file from a web server. Using a web browser, it's fine. I've written this (partial) script;

response = urllib2.urlopen(url, data, 10)
the_page = response.read()
f = open(filename, 'w')
f.write(the_page)
f.close()

The request succeeds and I get data. The problem is that the file I'm downloading -- a zip file -- doesn't work; the file appears to be corrupt. It seems to be the right sort of length, and looked at in text editor seems to look like a zip file content. Here are the headers from the download;

Content-Length: 9891 Content-Disposition: Content-Disposition:attachment; filename="TrunkBackup_20101230.zip" Date: Wed, 30 Dec 2009 12:22:08 GMT Accept-Ranges: bytes

When I check the length of the response, it is correct at 9891. I suspect what's happening is that when I call response.read() the result is a string with carriage returned 'helpfully' normalized (say, \r to \n). when I write the file, the binary data is slightly wrong, and the zip file is corrupt.

My problem is (A) I'm not sure if I'm right, and (B) if I am right, how to I save the binary data itself?

+4  A: 

Try opening the file in binary mode:

 f = open(filename, 'wb')
dusan
perfect! that worked.
Steve Cooper
+1  A: 

You could use the urlretrieve function for downloading raw binary files.

3lectrologos
I had problems with urlretrieve -- my app just stopped. Dunno why.
Steve Cooper
That's curious... I think urlretrieve(url, targetpath) works fine for files (and is probably pretty faster than the "url->string->file" way). Maybe you could show me your code.
3lectrologos