views:

49

answers:

1

Lets say I have a RSS feed which lists the 3 newest questions on SO. At 1 o'clock, the feed looks like this:

  • While making an RSS reader which saves articles, how can I prevent duplicates?
  • Convert char array to UNICODE in MFC C++
  • How to deploy a Java Swing application with an embedded JavaDB database?

At 2 o'clock, this feed looks like:

  • django url from another template than the one associated with the view-function
  • While making an RSS reader which saves articles, how can I prevent duplicates?
  • Convert char array to UNICODE in MFC C++

(duplicate articles are bold)

I want to download the RSS feed every 5 minutes, parse it and save the articles that aren't already saved, but I do not want duplicates (items that remain in the new, updated feed like the examples above). What can I use to determine if an article is already saved? Thanks

+4  A: 

In theory, you can just use guid for RSS 2, and id for Atom. These are each supposed to be permanent and unique. However, in practice some sites don't conform to this, so you have to use heuristics.

Matthew Flaschen
Sorry, I am making a generic RSS reader which should be able to read all feeds from all sites.
Time Machine
Koning Baard: That's where the heuristics come in. Check for duplicate permalink, title, description/summary, etc. It depends on how sensitive you want to be to duplicates, risking hypersensitivity when you go above the spec's requirements.
Peter Hosey