views:

1054

answers:

3

Slashdot's RSS feed is http://rss.slashdot.org/Slashdot/slashdot. If I download the XML file directly, I only get a few of the posts from today. However, if I subscribe to the feed in Google Reader, and keep scrolling down in their "infinite scroll" interface, it seems like I can get an arbitrary number of Slashdot posts from the past -- maybe I can get every Slashdot post ever?

So:

  1. How does Google Reader retrieve an unlimited number of posts from an RSS feed.
  2. How can I do the same.
+2  A: 

They have been indexing the web for years, and store everything they come over. So the moment you add a "subscribe to this" link to your page, the google crawler will start indexing that page and store it.

For RSS they also have the benefit of having multiple people subscribing to the same feed.

So for your application I suggest solving this by saving any downloaded items locally, so that new subscribes can go back to the point in time the first user subscribed to that feed. It won't give you unlimited, but over time it will give you a much larger archive than just the 20 latest items.

Jonas Follesø
+13  A: 

Google follows one instance of the feed for all its users, so they've been tracking and storing Slashdot articles, for example, long before any new subscriber starts reading.

To do the same, you would have to poll the RSS feeds you want at regular intervals and store any unique articles you find locally.

scronide
+6  A: 

I just discovered that if you're authenticated you can do something like:

http://www.google.com/reader/atom/feed/http://rss.slashdot.org/Slashdot/slashdot?n=100

to get an arbitrary number of results from a feed.

Horace Loeb
Anyone now how I can access this feed from a python script with feedparser? It's only possible to get the entries of this feed while logged in to google reader. But I don't know how to login from a script...
Rafael S. Calsaverini
@Rafael - If you're still looking, see this question: http://stackoverflow.com/questions/52880/google-reader-api-unread-count. It may help.
Neal S.