ansaurus

Question

what would be a good algorithm to keep count of unread items In an online feed aggregator implementation?

Answer 1

+1 A:

I'm not sure what to tell you exactly because what you're asking is reasonably straightforward.

First off, use Google Reader as a reference for online feed aggregators/readers. And if you're trying to recreate the functionality, Google Reader has pretty much nailed it already (imho).

Google Reader works simply by storing a list of feeds. In DB terms, you'd probably have these entities

User: id, name, email, etc...
Feed: id, feed_name, feed_url
Content: id, feed_id, title, content
User Feed: id, user_id, feed_id, user_label, has_read

Unread items:

SELECT COUNT(1)
FROM user u
JOIN user_feed uf ON uf.user_id = u.id
JOIN feed f ON f.id = uf.feed_id
WHERE has_read = 0

Unread items by feed:

SELECT feed_id, feed_name, COUNT(1)
FROM user u
JOIN user_feed uf ON uf.user_id = u.id
JOIN feed f ON f.id = uf.feed_id
WHERE has_read = 0
GROUP BY feed_id, feed_name

And then you just need some mechanism for marking items as read. In Google Reader's case, there are just AJAX calls triggered by mouseover events with additional links to mark everything as read, leave an item marked as unread and so on.

cletus 2009-06-12 21:35:47

The problem you're missing is that counting unread items at runtime is extremely inefficient - O(n) for number of unread items - and thus impractical at large scale. The approach Reader takes to this is to put a cap on the number of unread items it will count.

Nick Johnson 2009-06-13 11:20:06

I'm not missing it at all. Its just not relevant. The number of items you can count at runtime with the above scheme is huge in real terms and a modern database will typically just use the index rather than reading the rows to get counts of unread items. Don't micro-optimize. The above is a solid design.

cletus 2009-06-13 13:06:46

I'll just add that its really important you get a solid design first. Then and only then do you optimize it if and only if you need to. In this case, you could denormalze the model by storing unread counts on User Feed but don't start out that way. Doing that means you have to deal with issues of updating values in two tables when an item is read and the values can get out of sync.

cletus 2009-06-13 23:04:01

It's not micro-optimization. Even with an index, counting up all the items at runtime when there's potentially thousands (spread across hundreds of feeds) adds significant overhead to rendering a request. Precalculating counts when possible is much better, and failing that, specifying a reasonable cap like reader's '1000+' is a sensible alternative.

Nick Johnson 2009-06-14 21:04:56

You're both second-guessing the database before you've even identified a problem, which is ALWAYS a mistake. If you actually find a problem then and only then should you go caching values. You can alter the above model to do that IF REQUIRED. Secondly, by saying thousands of unread items, you're not being realistic about how users will use the system.

cletus 2009-06-14 22:59:53

ansaurus

tags:

views:

answers:

what would be a good algorithm to keep count of unread items In an online feed aggregator implementation?

related questions