tags:

views:

42

answers:

3

I'm creating my own RSS Reader, and to see which RSS items were already downloaded I'm converting the links(of each item) to MD5 hashes. When an RSS feed is loaded, it adds all the items to a ListView object and then checks the MD5 hashes against a text file and removes the ones that already exist. This way, only new items are shown.

Now this works right now because I have only 5 feeds, and each feed only loads about 10 items, but in terms of longterm use, the textfile would become cluttered with very old MD5 hashes. I'm wondering if an SQLite database would be better to do such a thing, such as storing the Feed link, the MD5 hash, and the Date. But even then, the database would grow larger and larger, and eventually would become very slow.

How do I mitigate this issue?

+1  A: 

If you're storing the date the hash was last retrieved, update the date every time you check the feed and the item still exists, and create a maintenance routine that removes dates older than a pre-determined interval.

Run this maintenance routine, say, once a day. You could use a database for this, or even a flat cache file (in XML or something.) You could then serialise/de-serialise an array of your MD5 hashes, dates, etc, remove any from the array you don't need any more, then serialise the array back to your cache file again.

Andy Shellam
This seems like a pretty good solution, thanks. :)
cam
A: 

Why not use the updated field in the RSS this way what you need to save is the latest fetch date a compare it to the feeds updated date.

2010-03-10T14:27:03Z

updated > last_fetched_date = get the feed

JeremySpouken
I would love to, but I have a very specific use for this, and unfortunately, many of the "RSS" feeds I need to implement don't provide a Date field. Only a link, and a title.
cam
Ow.. In a perfect world all RSS feed would have the guid field implemented :-)
JeremySpouken
The last changed date on the feed is optional according to the RSS 2.0 spec, and all fields in an item are optional although it must provide either a title or description. If you cannot determine when the feed was changed, or cannot uniquely identify an item, don't cache it ;-)
Andy Shellam
A: 

SQLite is quite fast, even with "large" files. It will take quite a bit of activity before the database becomes large enough to be an issue.

Robert Kluin
Just to clarify, I would also implement a "cleanup" routine to clear very old records. But doing more is likely premature optimization.
Robert Kluin