tags:

views:

90

answers:

2

How can I download files from a website using wildacrds in Python? I have a site that I need to download file from periodically. The problem is the filenames change each time. A portion of the file stays the same though. How can I use a wildcard to specify the unknown portion of the file in a URL?

+7  A: 

If the filename changes, there must still be a link to the file somewhere (otherwise nobody would ever guess the filename). A typical approach is to get the HTML page that contains a link to the file, search through that looking for the link target, and then send a second request to get the actual file you're after.

Web servers do not generally implement such a "wildcard" facility as you describe, so you must use other techniques.

Greg Hewgill
+1  A: 

You could try logging into the ftp server using ftplib. From the python docs:

from ftplib import FTP
ftp = FTP('ftp.cwi.nl')   # connect to host, default port
ftp.login()               # user anonymous, passwd anonymous@

The ftp object has a dir method that lists the contents of a directory. You could use this listing to find the name of the file you want.

DoR