views:

85

answers:

2

Part of an app I'm building needs to check RSS feeds for updates. I'm looking for a reliable way to know if a feed has new entries.

I know that sometimes people make posts to the future and, after that, posts to the present time which could cause some entries to be hidden. It seems like there could be more complications than that, as well. I also know that hashing the title or content would result in poor performance and unreliable results since those can change and are not a sign of new entries. And I know that a few years ago when I was maintaining a podcast RSS feed manually I never changed the item.

So, I need some way to reliably check RSS, Atom, etc feeds for new entries since they were lasted checked.

Specifically, this application will be written in Python for Google App Engine using Universal Feed Parser, but I doubt that matters too much in this case.

A: 

You can use a conditional get by adding a if-modified-since header to your http request. Well behaved servers will return a 304 unmodified if there are no changes.

James Deville
How reliable is this? How likely is it to find a non-well behaved server?
donut
Most of the major servers and major blogging platforms support conditional gets. It's usually a requirement from heavy bloggers since it saves bandwidth. I would guess that using this and Tim's response for servers that don't support conditional gets would get you 99% of the way there.
James Deville
A: 

Feed items have a unique id and/or a url that is likely to be unique. Hash only those together to get a quick and reasonable way to detect changes. But the only way to be absolutely sure would be to hash the content like you said.

Tim Santeford
Well, also like I said, if the content changes the feed will be marked as updated but no new posts. I think I need to edit my question. What would the problems be of relying solely on the unique ID?
donut