views:

57

answers:

2

I am building a news aggregation website and I am looking for a way to fetch old feeds(of any particular website ) into the system. During this course, I stumbled on to Feedjack. It is said that it handles what I needed. So I started diving into the source code. (I dont want to plugit in my django project directly.) All I see is this line:

self.fpf = parse_feed(self.feed.feed_url, agent=USER_AGENT, etag=self.feed.etag) # in bin/feedjack_update.py

I am not sure how this handles historical feed parsing. May I know what I am missing. One more question I have is, let alone feedjack, how can I access historical feeds of any website?

A: 

Historical feeds aren't available from websites. Unfortunately, the only way to "access" it is to store it yourself in a database. For common feeds, you may be able to get it from another aggregator. Otherwise, you build up the history starting from when the feed is first added.

Karl Bielefeldt
A: 

The only option I can see is using Google Reader. There is a blogpost about constructing feed history of any feed. I dont want to stick on service for just getting the historical feeds. Anyway, if there isn't any better option, I would go for that.

Maddy
I read the comments in the above mentioned Google Reader blogpost and also tried NewsBlur.com. Both seem to fail at the same thing, you can't guarantee a old feedentry unless someone subscribed to that website already atleast once. So google reader just fetches already cached entries. If no one subscribed to that website already, nothing would be in its cache and you cant retrieve it from this Google Reader platform. Game Over!
Maddy