I need to create a web tool like Google Reader for my college project.
I have 2 question about it:
1) How Google Reader track the read and unread posts ?
2) Google Reader save every post in the db or load the feeds at the moment ?
I need to create a web tool like Google Reader for my college project.
I have 2 question about it:
1) How Google Reader track the read and unread posts ?
2) Google Reader save every post in the db or load the feeds at the moment ?
re #2: Google has a special RSS crawler bot called FeedFetcher. When you request the RSS feed, it's dispatched to retrieve it, and stores the feed into its global (all-user) cache, identified by URL. Next time the feed is requested (even by a different user - as long as the URL matches), it is loaded from the cache.
I'm not sure what the cache invalidation mechanisms are, but the crawler definitely doesn't revisit the feeds strictly as often as the response's Cache-Control
mechanisms would indicate (that's probably a good thing, as many generated RSS feeds send no-cache
although they don't change too often). This internal cache doesn't seem to persist for longer than a few hours, though.
(these are the hypotheses I formulated some time ago from my RSS feed access logs; I still think they're valid, as I haven't seen any major change in the crawler's behavior since)