tags:

views:

78

answers:

3

I want to be able to run a little script that I can populate with a list of URLs and it pulls in and checks when the page was last updated? Has anyone done this?

I can only find a manual way of doing this using JavaScript by pasting this into the browser URL field

javascript:alert(document.lastModified)

Any ideas greatly received :)

A: 

If you use urllib2 (or perhaps httplib might be better still) in a python script you can inspect the headers that are returned for the last-modified field.

Jon Cage
A: 

The following will step through an array of URLs and display the last modified date or, if it's not present, the date of the server request.

string[] urls = { "http://boflynn.net", "http://slashdot.org" };
foreach ( string url in urls )
{
    System.Net.HttpWebRequest req =
        (System.Net.HttpWebRequest) System.Net.WebRequest.Create(url);
    System.Net.HttpWebResponse resp =
        (System.Net.HttpWebResponse) req.GetResponse();
    Console.WriteLine("{0} - {1}", url, resp.LastModified);
}
boflynn
A: 

It depends on what you mean by "last updated". Sure, there is the Last-Modified HTTP header, but it can be very misleading. For example, if the page is being served up dynamically, there is a good change that this field will be the current time, even if the content of the page itself (the part useful to humans) has not been updated in a rather long time. This page itself is a good example of this phenomenon.

If you are truly interested in the last time the content was updated, then I don't have an immediate answer.

Adam Paynter
I did wonder that.. Thanks for clarifying
leen3o