tags:

views:

343

answers:

5

I have a python application that relies on a file that is downloaded by a client from a website.

The website is not under my control and has no API to check for a "latest version" of the file.

Is there a simple way to access the file (in python) via a URL and check it's date (or size) without having to download it to the clients machine each time?

update: Thanks to those who mentioned the"last-modified" date. This is the correct parameter to look at.

I guess I didn't state the question well enough. How do I do this from a python script? I want to application to check the file and then download it if (last-modified date < current file date).

+4  A: 

Check the Last-Modified header.

EDIT: Try urllib2.

EDIT 2: This short tutorial should give you a pretty good feel for accomplishing your goal.

Hank Gay
Also you may want to consider using the ETag header (in conjunction with sending a If-None-Match header in the request)
Marc Novakowski
+4  A: 

There is no reliable way to do this. For all you know, the file can be created on the fly by the web server and the question "how old is this file" is not meaningful. The webserver may choose to provide Last-Modified header, but it could tell you whatever it wants.

scrible
+1  A: 

In HTTP 1.1, the Content-Disposition header field is intended to hold this kind of information in the creation-date parameter (see RFC 2183).

Gumbo
A: 

I built a tool that does this based on etags. Sounds a lot like what you're describing:

pfetch is a twisted tool that does this on a schedule and can run with many, many URLs and trigger events upon change (post-download). It's pretty simple, but still might be more complicated than you want.

This code however, is exactly what you're asking for.

So, take your pick. :)

Dustin
+2  A: 

Take into account that 'last-modified' may not be present:

>>> from urllib import urlopen
>>> f=urlopen('http://google.com/')
>>> i=f.info()
>>> i.keys()
['set-cookie', 'expires', 'server', 'connection', 'cache-control', 'date', 'content-type']
>>> i.getdate('date')
(2009, 1, 10, 16, 17, 8, 0, 1, 0)
>>> i.getheader('date')
'Sat, 10 Jan 2009 16:17:08 GMT'
>>> i.getdate('last-modified')
>>>

Now you can compare:

if (i.getdate('last-modified') or i.getheader('date')) > current_file_date:
  open('file', 'w').write(f.read())
Mike