ansaurus

Question

Python and urllib

Answer 1

+2 A:

Per the docs, urlretrieve puts the file to disk and returns a tuple (filename, headers). So the file is already saved when urlretrieve returns.

You can open and read the ZIP file you've retrieved with the zipfile module of the standard library. glob does not work inside zipfiles, only on normal filesystem directories.

Alex Martelli 2010-02-18 15:37:37

Thanks - so if I use urllib.urlretrieve("ftp://ftp2.census.gov/geo/tiger/TIGER2008/01_ALABAMA/01001_Autauga_County/tl_2008_01001_edges.zip", "F://") that saves it to my F-drive? Regarding my glob question, I wasn't very clear; I was wondering how I loop through a list of ftp folders on the site, rather than in a zip file.

celenius 2010-02-18 15:42:42

Answer 2

+2 A:

import os,urllib2
out=os.path.join("/tmp","test.zip")
url="ftp://ftp2.census.gov/geo/tiger/TIGER2008/01_ALABAMA/01001_Autauga_County/tl_2008_01001_edges.zip"
page=urllib2.urlopen(url)
open(out,"wb").write(page.read())

ghostdog74 2010-02-18 15:50:14

Thanks - this explains how I need to go about saving the zipfile object, which is very useful

celenius 2010-02-18 19:00:08

Answer 3

+2 A:

Use urllib2.urlopen() for the zip file data and directory listing.

To process zip files with the zipfile module, you can write them to a disk file which is then passed to the zipfile.ZipFile constructor. Retrieving the data is straightforward using read() on the file-like object returned by urllib2.urlopen().

Fetching directories:

>>> files = urllib2.urlopen('ftp://ftp2.census.gov/geo/tiger/TIGER2008/01_ALABAMA/').read().splitlines()
>>> for l in files[:4]: print l
... 
drwxrwsr-x    2 0        4009         4096 Nov 26  2008 01001_Autauga_County
drwxrwsr-x    2 0        4009         4096 Nov 26  2008 01003_Baldwin_County
drwxrwsr-x    2 0        4009         4096 Nov 26  2008 01005_Barbour_County
drwxrwsr-x    2 0        4009         4096 Nov 26  2008 01007_Bibb_County
>>>

Or, splitting for directory names:

>>> for l in files[:4]: print l.split()[-1]
... 
01001_Autauga_County
01003_Baldwin_County
01005_Barbour_County
01007_Bibb_County

gimel 2010-02-18 16:04:55

Thank you very much - this explains exactly what I need to do. I'm now happily downloading a few hundred files using this.

celenius 2010-02-18 18:59:33

ansaurus

tags:

views:

answers:

Python and urllib

related questions