tags:

views:

152

answers:

3

I need to find out if something has changed in a website using an RSS Feed. My solution was to constantly download the entire rss file, get the entries.length and compare it with the last known entries.length. I find it to be a very inelegant solution. Can anyone suggest a different approach?

Details:

• My application is an html file which uses javascript. It should be small enough to function as a desktop gadget or a browser extension.
• Currently, it downloads the rss file every thirty seconds just to get the length.
• It can download from any website with an Rss feed.

Comments and suggestions are appreciated, thanks in advance~ ^^

A: 

There are HTTP headers that can be used to determine if a resource has changed. Learn how to use the following headers to make your application more efficient.

HTTP Request Headers

  • If-Modified-Since
  • If-None-Match

HTTP Response Headers

  • Last-Modified
  • ETag

The basic strategy is to store the above-mentioned response headers that are returned on the first request and then send the values you stored in the HTTP request headers in future requests. If the HTTP resource has not been changed, you'll get back an HTTP 304 - Not Modified response and the resource will not even be downloaded. So this results in a very lightweight check for updates. If the resource has changed, you'll get back an HTTP 200 OK response and the resource will be downloaded in the usual way.

Asaph
Does this work for dynamic content?
echo
@justin: If the dynamic content provider has bothered to implement it, then yes; Otherwise no. It's HTTP level functionality so it's not specifically tied to static or dynamic content.
Asaph
@justin: I just checked the RSS feeds for the StackOverflow blog, Joel Spolsky's blog (joelonsoftware) and Jeff Atwood's blog (codinghorror) and all 3 of them implement `ETag/If-None-Match` headers. In addition, joelonsoftware and codinghorror implement `Last-Modified/If-Modified-Since` headers.
Asaph
+3  A: 

Many RSS feeds use the <lastBuildDate> element, which is a child of <channel>, to indicate when they were last updated. There's also a <pubDate> element, child of <item>, that serves the same purpose. If you plan on reading ATOM feeds, they have the <updated> element.

echo
beat me to it....
No Refunds No Returns
A: 

You should be keeping track of the GUID's/ArticleId's to see if you've seen an article before.

You should also see if your source supports conditional gets. It will allow you to check if anything has changed without needing to download the whole file. You can quickly check with this tool to see if your source supports conditional gets. (I wish everyone did.)

Kelly