ansaurus

Question

Best Database structure for storing RSS feeds

Answer 1

+3 A:

I would suggest you don't try to optimize away every possible copy of feed data at this stage of development (design, I presume). Concentrate on getting it working and when you're done, if you do some profiling and find that you can indeed save X% of storage if you use links or shared data between feeds, only then and if X is large enough to pay for the time it would take to optimize your DB would I suggest you implement any such more advanced schemes.

Assaf Lavie 2009-03-09 00:27:54

Answer 2

+2 A:

As Assaf said, I wouldn't worry about storing duplicated articles if they come from different feeds, for now at least. The complication it would add doesn't benefit the few kilobytes of space you'd save..

I suppose if you take a sha1 hash of the content, do SELECT id FROM articles WHERE hash = $hash and if something exists, simply have a "article_content_id" which if set points the articles content at another row... but, what if you have two articles:

id: 1
title: My First Post!
feed: Bobs site
content: Hi!
hash: abc
link: no
content_link_id:

id:2
title: My First Post!
feed: Planet Randompeople Aggregator
content:
hash: abc
content_link_id: 1

..this works fine, and you've saved 3 bytes by not duplicating the article (obviously more if the article was longer)

..but what happens when Bob decides to add adverts to his RSS feed, changing the content from Hi! to Hi!<p><img src='...'></p> - but Planet Randompeople strips out all images. Then to update a feed item, you have to check each row that content_link_id-links against the article you are updating, check if the new item has the same hash as the articles that link against it - if it is different, you have to break the link and copy the old data to the linking-item, then copy the new content to the original item..

There's possibly neater ways to do that, but my point is that it can get very complicated, and you will probably only save a few kilobytes (assuming the database engine doesn't do any compression itself) on a very limited subset of posts..

Other than that, having a table of feeds and items seems sensible, and is how most other RSS-storing databases I've seen dealt with it..

dbr 2009-03-09 01:31:58

ansaurus

tags:

views:

answers:

Best Database structure for storing RSS feeds

related questions