views:

81

answers:

2

I need to create a web tool like Google Reader for my college project.

I have 2 question about it:

1) How Google Reader track the read and unread posts ?

2) Google Reader save every post in the db or load the feeds at the moment ?

+3  A: 
  1. assign a hash to a single feed post (ie. date+url+??? = hash to identify a single post)
  2. loads them on the fly would be my guess, maybe caches a limited number per user.
Femaref
so Google Reader save the hashes in the database ?
xRobot
that's the way it probably works. Remember, this is just my interpretation of the frontend and the behaviour.
Femaref
+1  A: 

re #2: Google has a special RSS crawler bot called FeedFetcher. When you request the RSS feed, it's dispatched to retrieve it, and stores the feed into its global (all-user) cache, identified by URL. Next time the feed is requested (even by a different user - as long as the URL matches), it is loaded from the cache.

I'm not sure what the cache invalidation mechanisms are, but the crawler definitely doesn't revisit the feeds strictly as often as the response's Cache-Control mechanisms would indicate (that's probably a good thing, as many generated RSS feeds send no-cache although they don't change too often). This internal cache doesn't seem to persist for longer than a few hours, though.

(these are the hypotheses I formulated some time ago from my RSS feed access logs; I still think they're valid, as I haven't seen any major change in the crawler's behavior since)

Piskvor