I am working on a project that requires reliable access to historic feed entries not necessarily available in the current feed of the website. I have found several ways to access such data, but none of them give me all the characteristics I need.
Look at this as a brainstorm. I will tell you how much I have found and you can contribute if you have any other ideas.
Google AJAX Feed API [http://code.google.com/apis/ajaxfeeds/] - will limit you to 250 items
Unofficial Google Reader API [http://www.niallkennedy.com/blog/2005/12/google-reader-api.html] - Perfect but unofficial and therefore unreliable (and perhaps quasi-illegal?). Also, the authentication seems to be tricky.
Spinn3r [http://spinn3r.com/] - Costs a lot of money
Spidering the internet archive at the site of the feed [www.archive.org] - Lots of complexity, spotty coverage, only useful as a last resort
Yahoo! Feed API [http://www.niallkennedy.com/blog/2005/12/my-yahoo-feed-a.html], Yahoo! Search BOSS [http://developer.yahoo.com/search/boss/] - The first looks more like an aggregator, meaning i'd need a different registration for each feed and the second should give more access to yahoo's data but i can find no mention of feeds.
[thanks to Lou Franco] Bloglines Sync API [http://www.bloglines.com/services/api/sync] - Besides the problem of needing an account and being designed more as an aggregator, It does not have a way to add feeds to the account. So no retreival of arbitrary feeds. you need to manually add them through the reader first.
other search engines/blog search/whatever?
This is a really irritating problem as we are talking about semantic information that was once out there, is still (usually) valid, yet is difficult to access reliably, freely and without limits. Anybody know any alternative sources for feed entry goodness?